Skip to content

M27 solution: Part D capstone, a complete support agent

One deployable agent that uses every Part D pattern at once: agentic RAG with citations (M24), memory (M21), tracing and cost (M20/M25), reliability (M22), security guards (M22/M23), an eval gate (M20/M26), and an API (M11/M18). The whole thing runs offline with a deterministic mock model: no key, no spend.

Files

File Role
parts.py The Part D building blocks, trimmed and gathered: Trace+cost (M20/M25), ShortTermMemory/LongTermMemory (M21), retry/StepLimiter/approval_gate (M22), detect_injection/wrap_untrusted/redact_secrets/domain_allowed (M23). Each is labeled with its source module.
corpus.py The knowledge base + search tool for agentic RAG (M24); includes a two-hop question.
agent.py SupportAgent.chat(...), the integrated ReAct loop. Returns {answer, sources, blocked, cost, tokens, trace, injection_flags}. Injectable client.
mockmodel.py A deterministic fake model so the agent runs and is gated offline (searches, answers, attempts a risky email, greets).
evals.py The M26 gate over the agent: hours answer, multi-hop billing answer with sources, risky action blocked. Exits 0 pass / 1 fail.
app.py FastAPI /chat and /health, the deployable face.
demo.py End-to-end offline tour of all three scenarios with traces. Start here.
../starters/extend.py Turn it into a portfolio piece.

Run it

python demo.py            # offline tour, free
python evals.py ; echo $? # the eval gate (exit 0 = pass)

pip install fastapi "uvicorn[standard]"   # serve it
uvicorn app:app --reload
curl -s -X POST localhost:8000/chat -H 'Content-Type: application/json' -d '{"message":"What are your hours?"}'

How the patterns compose (per chat call)

  1. Recall user facts into the system prompt (memory, M21).
  2. limiter.tick() before any cost (step cap, M22).
  3. Model call wrapped in retry (M22), recorded as a span with a token and cost estimate (M20/M25).
  4. search_kb runs agentic RAG, tracking sources and wrapping results as untrusted DATA (M24/M23).
  5. send_email must pass the approval gate and the domain allowlist, with secrets redacted (M22/M23).
  6. Degrades to a safe message if the model keeps failing or the loop runs long (M22).
  7. evals.py gates all of it; app.py serves it.

Verified (offline)

  • Multi-hop knowledge question: answer names Dana Okafor, sources include D1 and D3, cost and a multi-span trace are produced.
  • Direct question ("hours"): one search, correct answer, source D4.
  • Risky email to a non-allowlisted domain: blocked == ['send_email'] (approval gate fires; allowlist would too); an allowlisted address with an approving approver is NOT blocked.
  • Memory recall returns a stored user fact; the step cap terminates a pathological infinite-loop model.
  • evals.py scores 3/3 and exits 0; a regression would exit 1 (CI gate, M26).
  • app.py verified with FastAPI TestClient: /health ok, /chat returns 200 with the full result.
  • All files compile. Offline via the mock; a live run reuses the M4 key.

What is simplified (and the production swap)

  • Per-process memory -> persist per user/session (M21 save_state).
  • Keyword search -> the M7 vector store for semantic retrieval.
  • Mock model -> live model, with deterministic eval subset in CI and live evals on a schedule (M26).
  • Estimated cost -> real response.usage (M25). The architecture does not change; only the backends.