Mender
Self-healing for production agents.

incident fb692b49f170 dismissed

Cluster

pattern
silent default to USD for ambiguous source currency
traces
3
project
finpay-support
created
May 7 23:15:57
updated
May 7 23:20:45

Baseline (live)

10/10
100% pass · 45s

Staged (patched)

9/10
90% pass · 105s
-10% lift

Hypothesis

The 'Be concise and direct' instruction introduced in v2 creates a persona that prioritizes immediate execution over the clarification requirement, causing the model to skip the questioning step in favor of its latent USD-centric bias.

Suspected prompt clause

Be concise and direct.
recommended
Replace 'Be concise and direct' with 'Be concise and accurate' and reinstate the explicit 'Never silently assume a default currency' instruction from v1.
evidence
Clause was added in v2 (released 2026-05-05T12:47:00+00:00), coinciding with the introduction of the silent USD default behavior which persists in v3.
confidence
0.80

Mender self-eval — how well did this cycle perform

overall
0.13
hypothesis correctness
0.00
fix effectiveness
0.00
eval set quality
0.20
token efficiency
1.00

lift=-10%; hyp=0.00; evalq=0.20; tok=1.00

Cycle parameters — self-tuned at start of cycle

eval_target_count
8
min_hypothesis_confidence
0.6
min_lift
0.25
cluster_max_failures
20

Past-cycle introspection

n_cycles_seen
0
trend
insufficient-data

Proposed patch — v3 → v4

Replaced 'direct' with 'accurate' and added an explicit prohibition against assuming a default currency.

--- finpay-support/v3 +++ finpay-support/v4 @@ -1,3 +1,3 @@ -You are FinPay Support. Be concise and direct. Reply in one or two sentences. +You are FinPay Support. Be concise and accurate. Reply in one or two sentences. Capabilities: @@ -5,3 +5,3 @@ - Convert amounts between currencies using `get_exchange_rate`. -Ask for clarification if the source currency is not specified or ambiguous.+Ask for clarification if the source currency is not specified or ambiguous. Never silently assume a default currency.

Eval cases

case baseline (live) staged baseline judge
ambiguous-source-eur-clarification
4178ms
pass pass The agent correctly asked for the source currency without making assumptions or performing calculations as required by the rubric.
ambiguous-source-gbp-direct
6991ms
pass pass The agent correctly requested the source currency as specified in the rubric.
ambiguous-source-jpy-natural
5593ms
pass pass The agent correctly identified the missing source currency and asked the user for clarification as required by the rubric.
ambiguous-source-chf-short
2803ms
pass pass The agent correctly asked for the source currency without assuming a default value.
ambiguous-source-mxn-rate
3122ms
pass pass The agent correctly asked for the source currency before providing a conversion value.
ambiguous-source-cad-question
3602ms
pass pass The agent correctly identifies the missing source currency and asks for clarification as required by the rubric.
control-explicit-aud-usd
3683ms
pass pass The agent correctly provided the AUD to USD conversion immediately as required by the rubric.
control-explicit-gbp-jpy
5032ms
pass pass The agent correctly converted 50 GBP to JPY immediately without asking for clarification, as required by the rubric.
control-explicit-eur-cad
4749ms
pass pass The agent correctly provided the EUR to CAD conversion immediately without asking for clarification.
adversarial-deposit-context
4818ms
pass fail The agent correctly provided deposit timing without asking for unnecessary currency clarification, adhering to the specific rubric requirement.

State history

atfromtonote
May 7 23:15:57 detected detected 3 affected traces
May 7 23:17:21 detected hypothesized Be concise and direct.
May 7 23:18:51 hypothesized evaluating baseline 10/10 pass
May 7 23:20:45 evaluating dismissed insufficient lift -10% (need +25%)