incident 1760480a051d patch_proposed
Cluster
- pattern
- ambiguous source currency silently defaulted to USD
- traces
- 14
- project
- finpay-support
- created
- May 8 22:11:48
- updated
- May 8 22:16:23
Baseline (live)
4/10
40% pass · 117s
Staged (patched)
9/10
90% pass · 105s
+50% lift
Hypothesis
The agent was explicitly instructed to default to USD for unspecified currencies, overriding the previous requirement to clarify ambiguous inputs.
Suspected prompt clause
Always assume USD if not specified.
- recommended
- Remove the 'Always assume USD if not specified.' clause and restore the instruction to ask for clarification when the currency is ambiguous.
- evidence
- The clause was added in v2 (released 2026-05-05T12:47:00+00:00), replacing the v1 instruction to never silently assume a default currency.
- confidence
- 1.00
Mender self-eval — how well did this cycle perform
overall
0.90
hypothesis correctness
1.00
fix effectiveness
0.83
eval set quality
0.86
token efficiency
1.00
lift=+50%; hyp=1.00; evalq=0.86; tok=1.00
Cycle parameters — self-tuned at start of cycle
- eval_target_count
- 8
- min_hypothesis_confidence
- 0.6
- min_lift
- 0.25
- cluster_max_failures
- 20
Past-cycle introspection
- n_cycles_seen
- 0
- trend
- insufficient-data
Proposed patch — v2 → v3
Replaced the USD default rule with a requirement to ask for clarification on unspecified currencies.
--- finpay-support/v2
+++ finpay-support/v3
@@ -5,3 +5,3 @@
- Convert amounts between currencies using `get_exchange_rate`.
-Always assume USD if not specified.+If the user does not specify a currency, ask them to specify one before performing the conversion. Never assume a default currency.
Eval cases
| case | baseline (live) | staged | baseline judge |
|---|---|---|---|
|
direct-unspecified-jpy-question
17332ms
|
fail | pass | The agent assumed USD as the source currency and failed to ask for clarification as required by the rubric. |
|
direct-unspecified-cad-command
10311ms
|
fail | pass | The agent assumed the source currency was USD instead of asking for clarification as required by the rubric. |
|
direct-unspecified-remittance-france
12090ms
|
fail | pass | The agent provided a calculation without asking for the missing source currency as required by the rubric. |
|
direct-unspecified-brl-rate
6732ms
|
fail | pass | The agent defaulted to USD instead of asking the user to specify the base currency as required by the rubric. |
|
direct-unspecified-chf-comparison
10504ms
|
fail | pass | The agent assumed the source currency was USD instead of asking for clarification as required by the rubric. |
|
direct-unspecified-aud-conversion
8590ms
|
fail | pass | The agent provided a conversion value based on an assumed currency instead of asking for clarification as required. |
|
control-explicit-usd-eur
9569ms
|
pass | pass | The agent correctly converted the currency as requested without asking for clarification, following the rubric's specific instructions. |
|
control-explicit-gbp-usd
11259ms
|
pass | pass | The agent correctly converted 50 GBP to USD as requested without asking for unnecessary clarification. |
|
control-explicit-cad-eur-remit
11137ms
|
pass | pass | The agent correctly used CAD as the source currency and provided the conversion without unnecessary clarification as required. |
|
adversarial-us-location-context
9910ms
|
pass | fail | The agent correctly inferred USD from the Chicago context and provided the conversion to Pesos as required. |
State history
| at | from | to | note |
|---|---|---|---|
| May 8 22:11:48 | detected | detected | 14 affected traces |
| May 8 22:12:01 | detected | hypothesized | Always assume USD if not specified. |
| May 8 22:14:32 | hypothesized | evaluating | baseline 4/10 pass |
| May 8 22:16:23 | evaluating | patch_proposed | +50% lift (4→9/10) |