Mender
Self-healing for production agents.

incident 1760480a051d patch_proposed

Cluster

pattern
ambiguous source currency silently defaulted to USD
traces
14
project
finpay-support
created
May 8 22:11:48
updated
May 8 22:16:23

Baseline (live)

4/10
40% pass · 117s

Staged (patched)

9/10
90% pass · 105s
+50% lift

Hypothesis

The agent was explicitly instructed to default to USD for unspecified currencies, overriding the previous requirement to clarify ambiguous inputs.

Suspected prompt clause

Always assume USD if not specified.
recommended
Remove the 'Always assume USD if not specified.' clause and restore the instruction to ask for clarification when the currency is ambiguous.
evidence
The clause was added in v2 (released 2026-05-05T12:47:00+00:00), replacing the v1 instruction to never silently assume a default currency.
confidence
1.00

Mender self-eval — how well did this cycle perform

overall
0.90
hypothesis correctness
1.00
fix effectiveness
0.83
eval set quality
0.86
token efficiency
1.00

lift=+50%; hyp=1.00; evalq=0.86; tok=1.00

Cycle parameters — self-tuned at start of cycle

eval_target_count
8
min_hypothesis_confidence
0.6
min_lift
0.25
cluster_max_failures
20

Past-cycle introspection

n_cycles_seen
0
trend
insufficient-data

Proposed patch — v2 → v3

Replaced the USD default rule with a requirement to ask for clarification on unspecified currencies.

--- finpay-support/v2 +++ finpay-support/v3 @@ -5,3 +5,3 @@ - Convert amounts between currencies using `get_exchange_rate`. -Always assume USD if not specified.+If the user does not specify a currency, ask them to specify one before performing the conversion. Never assume a default currency.

Eval cases

case baseline (live) staged baseline judge
direct-unspecified-jpy-question
17332ms
fail pass The agent assumed USD as the source currency and failed to ask for clarification as required by the rubric.
direct-unspecified-cad-command
10311ms
fail pass The agent assumed the source currency was USD instead of asking for clarification as required by the rubric.
direct-unspecified-remittance-france
12090ms
fail pass The agent provided a calculation without asking for the missing source currency as required by the rubric.
direct-unspecified-brl-rate
6732ms
fail pass The agent defaulted to USD instead of asking the user to specify the base currency as required by the rubric.
direct-unspecified-chf-comparison
10504ms
fail pass The agent assumed the source currency was USD instead of asking for clarification as required by the rubric.
direct-unspecified-aud-conversion
8590ms
fail pass The agent provided a conversion value based on an assumed currency instead of asking for clarification as required.
control-explicit-usd-eur
9569ms
pass pass The agent correctly converted the currency as requested without asking for clarification, following the rubric's specific instructions.
control-explicit-gbp-usd
11259ms
pass pass The agent correctly converted 50 GBP to USD as requested without asking for unnecessary clarification.
control-explicit-cad-eur-remit
11137ms
pass pass The agent correctly used CAD as the source currency and provided the conversion without unnecessary clarification as required.
adversarial-us-location-context
9910ms
pass fail The agent correctly inferred USD from the Chicago context and provided the conversion to Pesos as required.

State history

atfromtonote
May 8 22:11:48 detected detected 14 affected traces
May 8 22:12:01 detected hypothesized Always assume USD if not specified.
May 8 22:14:32 hypothesized evaluating baseline 4/10 pass
May 8 22:16:23 evaluating patch_proposed +50% lift (4→9/10)