incident 98f497db1a90 dismissed
Cluster
- pattern
- silent default to USD for ambiguous source currency
- traces
- 7
- project
- finpay-support
- created
- May 7 21:15:25
- updated
- May 7 21:21:18
Baseline (live)
4/10
40% pass · 144s
Staged (patched)
6/10
60% pass · 159s
+20% lift
Hypothesis
The addition of a mandatory defaulting rule in version v2 overrides the previous requirement to clarify ambiguous source currencies, forcing the model to use USD.
Suspected prompt clause
Always assume USD if not specified.
- recommended
- Remove the 'Always assume USD if not specified' clause and restore the clarification logic from version v1.
- evidence
- The defaulting clause was added in v2 (released 2026-05-05T12:47:00+00:00), which the version notes identify as the cause of the regression.
- confidence
- 1.00
Mender self-eval — how well did this cycle perform
overall
0.53
hypothesis correctness
0.50
fix effectiveness
0.33
eval set quality
0.86
token efficiency
1.00
lift=+20%; hyp=0.50; evalq=0.86; tok=1.00
Cycle parameters — self-tuned at start of cycle
- eval_target_count
- 8
- min_hypothesis_confidence
- 0.6
- min_lift
- 0.25
- cluster_max_failures
- 20
Past-cycle introspection
- n_cycles_seen
- 0
- trend
- insufficient-data
Proposed patch — v2 → v3
Removed the mandatory USD default rule.
--- finpay-support/v2
+++ finpay-support/v3
@@ -3,5 +3,3 @@
Capabilities:
- Answer general questions about the app.
- - Convert amounts between currencies using `get_exchange_rate`.
-
-Always assume USD if not specified.+ - Convert amounts between currencies using `get_exchange_rate`.
Eval cases
| case | baseline (live) | staged | baseline judge |
|---|---|---|---|
|
ambiguous-source-eur
10357ms
|
fail | pass | The agent assumed USD as the source currency instead of asking for clarification as required by the rubric. |
|
ambiguous-source-mxn
10436ms
|
fail | fail | The agent assumed USD and performed the conversion instead of asking for the source currency as required. |
|
ambiguous-source-jpy
8322ms
|
fail | fail | The agent assumed USD as the source currency instead of asking for clarification as required by the rubric. |
|
ambiguous-source-gbp-context
3768ms
|
fail | fail | The agent assumed the source currency was USD instead of asking for clarification as required. |
|
ambiguous-source-sgd
15733ms
|
fail | pass | The agent assumed the source currency was USD instead of asking the user for clarification as required by the rubric. |
|
ambiguous-source-cad-rate
17896ms
|
fail | fail | The agent assumed USD as the source currency instead of asking for clarification as required by the rubric. |
|
explicit-source-gbp-control
31756ms
|
pass | pass | The agent correctly converted 100 GBP to USD without asking for clarification, following the rubric exactly. |
|
explicit-source-eur-control
12507ms
|
pass | pass | The agent correctly converted the specified amount from EUR to JPY without asking for a source currency. |
|
explicit-source-usd-control
27665ms
|
pass | pass | The agent correctly converted 200 USD to AUD without asking for clarification, adhering to the rubric requirements. |
|
adversarial-metric-conversion
6006ms
|
pass | pass | The agent correctly converted the distance without asking for a source currency or mentioning USD as required by the rubric. |
State history
| at | from | to | note |
|---|---|---|---|
| May 7 21:15:25 | detected | detected | 7 affected traces |
| May 7 21:15:28 | detected | hypothesized | Always assume USD if not specified. |
| May 7 21:18:33 | hypothesized | evaluating | baseline 4/10 pass |
| May 7 21:21:18 | evaluating | dismissed | insufficient lift +20% (need +25%) |