Mender
Self-healing for production agents.

incident d60945c3d90a dismissed

Cluster

pattern
silent default of ambiguous source currency to USD
traces
3
project
finpay-support
created
May 7 22:45:38
updated
May 7 22:51:26

Baseline (live)

10/10
100% pass · 95s

Staged (patched)

10/10
100% pass · 130s
+0% lift

Hypothesis

The brevity and directness constraints prioritize providing an immediate answer over the clarification process required for ambiguous inputs, causing the agent to silently default to USD despite the instruction to ask for clarification.

Suspected prompt clause

Be concise and direct. Reply in one or two sentences.
recommended
Replace the 'Be concise and direct. Reply in one or two sentences.' clause with v1's 'Be concise, polite, and accurate. Reply in two or three sentences.' and restore the explicit prohibition: 'Never silently assume a default currency.'
evidence
The brevity constraint was introduced in v2 (released 2026-05-05T12:47Z) alongside the regression and remains in the current version (v3) where the failures are observed.
confidence
0.80

Mender self-eval — how well did this cycle perform

overall
0.13
hypothesis correctness
0.00
fix effectiveness
0.00
eval set quality
0.20
token efficiency
1.00

lift=+0%; hyp=0.00; evalq=0.20; tok=1.00

Cycle parameters — self-tuned at start of cycle

eval_target_count
8
min_hypothesis_confidence
0.6
min_lift
0.25
cluster_max_failures
20

Past-cycle introspection

n_cycles_seen
0
trend
insufficient-data

Proposed patch — v3 → v4

Updated response length constraints and added an explicit prohibition against defaulting to a currency.

--- finpay-support/v3 +++ finpay-support/v4 @@ -1,3 +1,3 @@ -You are FinPay Support. Be concise and direct. Reply in one or two sentences. +You are FinPay Support. Be concise, polite, and accurate. Reply in two or three sentences. Capabilities: @@ -5,3 +5,3 @@ - Convert amounts between currencies using `get_exchange_rate`. -Ask for clarification if the source currency is not specified or ambiguous.+Ask for clarification if the source currency is not specified or ambiguous. Never silently assume a default currency.

Eval cases

case baseline (live) staged baseline judge
ambiguous-source-eur
10142ms
pass pass The agent correctly asked for the source currency without making any assumptions about the 100 units.
ambiguous-source-gbp
2904ms
pass pass The agent correctly identified the missing source currency and requested clarification without providing an actual conversion value.
ambiguous-source-jpy-conversational
3052ms
pass pass The agent correctly requested clarification on the source currency without making any assumptions or providing conversion results.
ambiguous-source-cad-rate
13161ms
pass pass The agent correctly requested the missing source currency instead of defaulting to USD, as required by the rubric.
ambiguous-source-chf-verb-variation
9737ms
pass pass The agent correctly identifies the missing source currency and asks for clarification as required by the rubric.
ambiguous-source-mxn-large-num
3778ms
pass pass The agent correctly requested the source currency and specific Peso type as required by the rubric.
explicit-usd-jpy-control
12476ms
pass pass The agent correctly converted 50 USD to JPY as requested without asking for unnecessary clarification.
explicit-gbp-eur-control
17416ms
pass pass The agent correctly converted 20 GBP to EUR and identified both currencies as required by the rubric.
explicit-jpy-aud-control
18520ms
pass pass The agent provided a direct conversion from JPY to AUD without mentioning USD, meeting all rubric requirements.
non-conversion-withdrawal-adversarial
4289ms
pass pass The agent correctly provided withdrawal instructions without asking for currency clarification, meeting all rubric requirements.

State history

atfromtonote
May 7 22:45:38 detected detected 3 affected traces
May 7 22:47:03 detected hypothesized Be concise and direct. Reply in one or two sentences.
May 7 22:48:57 hypothesized evaluating baseline 10/10 pass
May 7 22:51:26 evaluating dismissed insufficient lift +0% (need +25%)