About Mender

Mender is an autonomous agent that watches other agents in production. It reads their traces and eval scores via the Arize Phoenix MCP server, finds quality regressions, hypothesizes the root cause, generates and runs targeted evals, drafts a prompt patch, and asks for one-click approval in Slack.

Built for the Google Cloud Rapid Agent Hackathon, Arize track.

The loop

   ┌─────────────────────────────────────────────────────────┐
   │  every 15 min · Cloud Scheduler · Cloud Run             │
   └─────────────────────────────────────────────────────────┘
                                ↓
       ┌────────────┐    ┌──────────────────┐    ┌─────────────┐
       │  Phoenix   │ ←─ │  Mender (Gemini) │ ─→ │  FinPay     │
       │  (traces,  │    │  ADK + MCP       │    │  (target)   │
       │   evals,   │ ─→ │                  │ ←─ │  ADK        │
       │   prompts) │    └──────────────────┘    └─────────────┘
       └────────────┘             ↓
                          ┌──────────────┐
                          │  Slack       │   ← Block Kit incident card
                          │  webhook     │      with Approve/Discard
                          └──────────────┘

Components

A	FinPay Support — the deliberately fragile target agent
B	OpenInference instrumentation + LLM-as-judge eval scorer
C1–C2	Mender ADK agent + Phoenix MCP toolset
C3	Typed trace tools (timeseries, drill-down)
C4–C5	Failure clustering + hypothesis generation
C6–C7	Eval-set generator + runner
C8–C9	Patch generator + staging mechanism
C10	Incident state machine
D	Slack action layer (Block Kit + interactive Approve)
E	Web UI (this app)

Repo

github.com/mattspaulding/mender-agent · MIT