Mender
Self-healing for production agents.

About Mender

Mender is an autonomous agent that watches other agents in production. It reads their traces and eval scores via the Arize Phoenix MCP server, finds quality regressions, hypothesizes the root cause, generates and runs targeted evals, drafts a prompt patch, and asks for one-click approval in Slack.

Built for the Google Cloud Rapid Agent Hackathon, Arize track.

The loop

   ┌─────────────────────────────────────────────────────────┐
   │  every 15 min · Cloud Scheduler · Cloud Run             │
   └─────────────────────────────────────────────────────────┘
                                ↓
       ┌────────────┐    ┌──────────────────┐    ┌─────────────┐
       │  Phoenix   │ ←─ │  Mender (Gemini) │ ─→ │  FinPay     │
       │  (traces,  │    │  ADK + MCP       │    │  (target)   │
       │   evals,   │ ─→ │                  │ ←─ │  ADK        │
       │   prompts) │    └──────────────────┘    └─────────────┘
       └────────────┘             ↓
                          ┌──────────────┐
                          │  Slack       │   ← Block Kit incident card
                          │  webhook     │      with Approve/Discard
                          └──────────────┘
  

Components

AFinPay Support — the deliberately fragile target agent
BOpenInference instrumentation + LLM-as-judge eval scorer
C1–C2Mender ADK agent + Phoenix MCP toolset
C3Typed trace tools (timeseries, drill-down)
C4–C5Failure clustering + hypothesis generation
C6–C7Eval-set generator + runner
C8–C9Patch generator + staging mechanism
C10Incident state machine
DSlack action layer (Block Kit + interactive Approve)
EWeb UI (this app)

Repo

github.com/mattspaulding/mender-agent · MIT