Mender
Self-healing for production agents.

incident a3cb47086f88 dismissed

Cluster

pattern
ambiguous source currency silently defaulted to USD
traces
3
project
finpay-support
created
May 7 22:31:09
updated
May 7 22:35:14

Baseline (live)

9/10
90% pass · 47s

Staged (patched)

9/10
90% pass · 75s
+0% lift

Hypothesis

The 'Be concise and direct' instruction introduced in v2 and retained in v3 encourages the model to provide immediate answers, causing it to skip the required clarification step and revert to an internal USD default bias.

Suspected prompt clause

Be concise and direct.
recommended
Replace 'Be concise and direct' with the v1 instruction 'Be concise, polite, and accurate' and restore the explicit 'Never silently assume a default currency' constraint.
evidence
This clause was introduced in v2 (released 2026-05-05T12:47:00+00:00) alongside the regression, and the silent USD defaulting persists in the current v3 version despite the removal of the explicit assumption clause.
confidence
0.70

Mender self-eval — how well did this cycle perform

overall
0.12
hypothesis correctness
0.00
fix effectiveness
0.00
eval set quality
0.14
token efficiency
1.00

lift=+0%; hyp=0.00; evalq=0.14; tok=1.00

Cycle parameters — self-tuned at start of cycle

eval_target_count
8
min_hypothesis_confidence
0.6
min_lift
0.25
cluster_max_failures
20

Past-cycle introspection

n_cycles_seen
0
trend
insufficient-data

Proposed patch — v3 → v4

Replaced 'direct' with 'polite, and accurate' and added an explicit prohibition against assuming a default currency.

--- finpay-support/v3 +++ finpay-support/v4 @@ -1,3 +1,3 @@ -You are FinPay Support. Be concise and direct. Reply in one or two sentences. +You are FinPay Support. Be concise, polite, and accurate. Reply in one or two sentences. Capabilities: @@ -5,3 +5,3 @@ - Convert amounts between currencies using `get_exchange_rate`. -Ask for clarification if the source currency is not specified or ambiguous.+Ask for clarification if the source currency is not specified or ambiguous. Never silently assume a default currency.

Eval cases

case baseline (live) staged baseline judge
ambiguous-source-eur
6224ms
pass pass The agent correctly asked for the source currency and did not provide a conversion based on an assumed currency.
ambiguous-source-gbp-shorthand
3896ms
pass pass The agent correctly identified the missing source currency and requested clarification as required by the rubric.
ambiguous-source-cad-natural
4717ms
pass pass The agent followed the rubric by asking for clarification on the input currency without making any assumptions.
ambiguous-source-jpy-business
4779ms
pass pass The agent correctly requested the source currency as required by the rubric without making any assumptions.
ambiguous-source-aud-informal
4283ms
pass pass The agent correctly identified the missing source currency and requested clarification as required by the rubric.
ambiguous-source-chf-transfer
4357ms
pass pass The agent correctly identifies the target currency as CHF and asks for clarification regarding the ambiguous source currency.
control-explicit-usd-eur
4736ms
pass pass The agent provided a direct numerical conversion from USD to EUR as requested without asking for clarification.
control-general-fees
5093ms
pass pass The agent clearly explains the fee structure and directs the user to the app settings as required.
control-explicit-gbp-mxn
4953ms
fail fail The agent asked for clarification instead of providing the GBP to MXN conversion as required by the rubric.
adversarial-rate-no-amount
4241ms
pass pass The agent provided the exchange rate directly without seeking clarification, adhering to the rubric's specific instructions.

State history

atfromtonote
May 7 22:31:09 detected detected 3 affected traces
May 7 22:32:43 detected hypothesized Be concise and direct.
May 7 22:33:51 hypothesized evaluating baseline 9/10 pass
May 7 22:35:14 evaluating dismissed insufficient lift +0% (need +25%)