M27 solution: Part D capstone, a complete support agent
One deployable agent that uses every Part D pattern at once: agentic RAG with citations (M24), memory (M21), tracing and cost (M20/M25), reliability (M22), security guards (M22/M23), an eval gate (M20/M26), and an API (M11/M18). The whole thing runs offline with a deterministic mock model: no key, no spend.
Files
| File | Role |
|---|---|
parts.py |
The Part D building blocks, trimmed and gathered: Trace+cost (M20/M25), ShortTermMemory/LongTermMemory (M21), retry/StepLimiter/approval_gate (M22), detect_injection/wrap_untrusted/redact_secrets/domain_allowed (M23). Each is labeled with its source module. |
corpus.py |
The knowledge base + search tool for agentic RAG (M24); includes a two-hop question. |
agent.py |
SupportAgent.chat(...), the integrated ReAct loop. Returns {answer, sources, blocked, cost, tokens, trace, injection_flags}. Injectable client. |
mockmodel.py |
A deterministic fake model so the agent runs and is gated offline (searches, answers, attempts a risky email, greets). |
evals.py |
The M26 gate over the agent: hours answer, multi-hop billing answer with sources, risky action blocked. Exits 0 pass / 1 fail. |
app.py |
FastAPI /chat and /health, the deployable face. |
demo.py |
End-to-end offline tour of all three scenarios with traces. Start here. |
../starters/extend.py |
Turn it into a portfolio piece. |
Run it
python demo.py # offline tour, free
python evals.py ; echo $? # the eval gate (exit 0 = pass)
pip install fastapi "uvicorn[standard]" # serve it
uvicorn app:app --reload
curl -s -X POST localhost:8000/chat -H 'Content-Type: application/json' -d '{"message":"What are your hours?"}'
How the patterns compose (per chat call)
- Recall user facts into the system prompt (memory, M21).
limiter.tick()before any cost (step cap, M22).- Model call wrapped in
retry(M22), recorded as a span with a token and cost estimate (M20/M25). search_kbruns agentic RAG, tracking sources and wrapping results as untrusted DATA (M24/M23).send_emailmust pass the approval gate and the domain allowlist, with secrets redacted (M22/M23).- Degrades to a safe message if the model keeps failing or the loop runs long (M22).
evals.pygates all of it;app.pyserves it.
Verified (offline)
- Multi-hop knowledge question: answer names Dana Okafor, sources include
D1andD3, cost and a multi-span trace are produced. - Direct question ("hours"): one search, correct answer, source
D4. - Risky email to a non-allowlisted domain:
blocked == ['send_email'](approval gate fires; allowlist would too); an allowlisted address with an approving approver is NOT blocked. - Memory recall returns a stored user fact; the step cap terminates a pathological infinite-loop model.
evals.pyscores 3/3 and exits 0; a regression would exit 1 (CI gate, M26).app.pyverified with FastAPITestClient:/healthok,/chatreturns 200 with the full result.- All files compile. Offline via the mock; a live run reuses the M4 key.
What is simplified (and the production swap)
- Per-process memory -> persist per user/session (M21
save_state). - Keyword search -> the M7 vector store for semantic retrieval.
- Mock model -> live model, with deterministic eval subset in CI and live evals on a schedule (M26).
- Estimated cost -> real
response.usage(M25). The architecture does not change; only the backends.