About Mender
Mender is an autonomous agent that watches other agents in production. It reads their traces and eval scores via the Arize Phoenix MCP server, finds quality regressions, hypothesizes the root cause, generates and runs targeted evals, drafts a prompt patch, and asks for one-click approval in Slack.
Built for the Google Cloud Rapid Agent Hackathon, Arize track.
The loop
┌─────────────────────────────────────────────────────────┐
│ every 15 min · Cloud Scheduler · Cloud Run │
└─────────────────────────────────────────────────────────┘
↓
┌────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Phoenix │ ←─ │ Mender (Gemini) │ ─→ │ FinPay │
│ (traces, │ │ ADK + MCP │ │ (target) │
│ evals, │ ─→ │ │ ←─ │ ADK │
│ prompts) │ └──────────────────┘ └─────────────┘
└────────────┘ ↓
┌──────────────┐
│ Slack │ ← Block Kit incident card
│ webhook │ with Approve/Discard
└──────────────┘
Components
| A | FinPay Support — the deliberately fragile target agent |
| B | OpenInference instrumentation + LLM-as-judge eval scorer |
| C1–C2 | Mender ADK agent + Phoenix MCP toolset |
| C3 | Typed trace tools (timeseries, drill-down) |
| C4–C5 | Failure clustering + hypothesis generation |
| C6–C7 | Eval-set generator + runner |
| C8–C9 | Patch generator + staging mechanism |
| C10 | Incident state machine |
| D | Slack action layer (Block Kit + interactive Approve) |
| E | Web UI (this app) |