incident c2b124b92bb9 patch_applied
Cluster
- pattern
- silent default to USD for ambiguous source currency
- traces
- 7
- project
- finpay-support
- created
- May 7 21:22:01
- updated
- May 7 21:41:10
Baseline (live)
4/10
40% pass · 195s
Staged (patched)
9/10
90% pass · 133s
+50% lift
Hypothesis
The agent silently defaults to USD because of a newly added instruction to assume USD when currency is not specified, which overrides the previous logic requiring clarification for ambiguous inputs.
Suspected prompt clause
Always assume USD if not specified.
- recommended
- Remove the 'Always assume USD if not specified.' clause and replace it with an instruction to ask for clarification if the source currency is ambiguous.
- evidence
- The clause was added in v2 (released 2026-05-05T12:47:00+00:00), replacing the v1 requirement to ask a clarifying question for ambiguous currencies.
- confidence
- 1.00
Mender self-eval — how well did this cycle perform
overall
0.90
hypothesis correctness
1.00
fix effectiveness
0.83
eval set quality
0.86
token efficiency
1.00
lift=+50%; hyp=1.00; evalq=0.86; tok=1.00
Cycle parameters — self-tuned at start of cycle
- eval_target_count
- 8
- min_hypothesis_confidence
- 0.6
- min_lift
- 0.25
- cluster_max_failures
- 20
Past-cycle introspection
- n_cycles_seen
- 0
- trend
- insufficient-data
Proposed patch — v2 → v3
Replaced the USD default assumption with a requirement to ask for clarification on ambiguous currencies.
--- finpay-support/v2
+++ finpay-support/v3
@@ -5,3 +5,3 @@
- Convert amounts between currencies using `get_exchange_rate`.
-Always assume USD if not specified.+Ask for clarification if the source currency is not specified or ambiguous.
Eval cases
| case | baseline (live) | staged | baseline judge |
|---|---|---|---|
|
ambiguous-source-eur
26236ms
|
fail | pass | The agent assumed the source currency was USD instead of asking for clarification as required by the rubric. |
|
ambiguous-source-cad
20553ms
|
fail | pass | The agent assumed USD as the source currency instead of asking for clarification as required by the rubric. |
|
ambiguous-source-chf
17042ms
|
fail | pass | The agent assumed USD instead of asking for clarification on the source currency as required by the rubric. |
|
ambiguous-source-aud
17223ms
|
fail | pass | The agent assumed the input was in USD and performed a calculation instead of asking for clarification as required. |
|
ambiguous-source-brl
12537ms
|
fail | pass | The agent defaulted to USD instead of requesting the missing source currency as required by the rubric. |
|
ambiguous-source-inr
18587ms
|
fail | fail | The agent assumed USD instead of asking for clarification regarding the source currency as required by the rubric. |
|
explicit-conversion-gbp-usd
38665ms
|
pass | pass | The agent correctly identified the source and target currencies and provided an accurate conversion. |
|
general-rate-query
11423ms
|
pass | pass | The agent provided the exchange rate directly without asking for an amount or unnecessary clarifications as per the rubric. |
|
non-conversion-password
3307ms
|
pass | pass | The agent provided clear password reset instructions and correctly avoided mentioning currencies as required by the rubric. |
|
explicit-source-aud-to-usd
29422ms
|
pass | pass | The agent correctly identified the source currency and performed the conversion without asking for unnecessary clarification. |
State history
| at | from | to | note |
|---|---|---|---|
| May 7 21:22:01 | detected | detected | 7 affected traces |
| May 7 21:22:08 | detected | hypothesized | Always assume USD if not specified. |
| May 7 21:25:44 | hypothesized | evaluating | baseline 4/10 pass |
| May 7 21:28:00 | evaluating | patch_proposed | +50% lift (4→9/10) |
| May 7 21:41:10 | patch_proposed | patch_applied | promoted v2->v3 |