M27 solution: Part D capstone, a complete support agent

One deployable agent that uses every Part D pattern at once: agentic RAG with citations (M24), memory (M21), tracing and cost (M20/M25), reliability (M22), security guards (M22/M23), an eval gate (M20/M26), and an API (M11/M18). The whole thing runs offline with a deterministic mock model: no key, no spend.

Files

File	Role
`parts.py`	The Part D building blocks, trimmed and gathered: `Trace`+`cost` (M20/M25), `ShortTermMemory`/`LongTermMemory` (M21), `retry`/`StepLimiter`/`approval_gate` (M22), `detect_injection`/`wrap_untrusted`/`redact_secrets`/`domain_allowed` (M23). Each is labeled with its source module.
`corpus.py`	The knowledge base + `search` tool for agentic RAG (M24); includes a two-hop question.
`agent.py`	`SupportAgent.chat(...)`, the integrated ReAct loop. Returns `{answer, sources, blocked, cost, tokens, trace, injection_flags}`. Injectable client.
`mockmodel.py`	A deterministic fake model so the agent runs and is gated offline (searches, answers, attempts a risky email, greets).
`evals.py`	The M26 gate over the agent: hours answer, multi-hop billing answer with sources, risky action blocked. Exits 0 pass / 1 fail.
`app.py`	FastAPI `/chat` and `/health`, the deployable face.
`demo.py`	End-to-end offline tour of all three scenarios with traces. Start here.
`../starters/extend.py`	Turn it into a portfolio piece.

Run it

python demo.py            # offline tour, free
python evals.py ; echo $? # the eval gate (exit 0 = pass)

pip install fastapi "uvicorn[standard]"   # serve it
uvicorn app:app --reload
curl -s -X POST localhost:8000/chat -H 'Content-Type: application/json' -d '{"message":"What are your hours?"}'

How the patterns compose (per `chat` call)

Recall user facts into the system prompt (memory, M21).
limiter.tick() before any cost (step cap, M22).
Model call wrapped in retry (M22), recorded as a span with a token and cost estimate (M20/M25).
search_kb runs agentic RAG, tracking sources and wrapping results as untrusted DATA (M24/M23).
send_email must pass the approval gate and the domain allowlist, with secrets redacted (M22/M23).
Degrades to a safe message if the model keeps failing or the loop runs long (M22).
evals.py gates all of it; app.py serves it.

Verified (offline)

Multi-hop knowledge question: answer names Dana Okafor, sources include D1 and D3, cost and a multi-span trace are produced.
Direct question ("hours"): one search, correct answer, source D4.
Risky email to a non-allowlisted domain: blocked == ['send_email'] (approval gate fires; allowlist would too); an allowlisted address with an approving approver is NOT blocked.
Memory recall returns a stored user fact; the step cap terminates a pathological infinite-loop model.
evals.py scores 3/3 and exits 0; a regression would exit 1 (CI gate, M26).
app.py verified with FastAPI TestClient: /health ok, /chat returns 200 with the full result.
All files compile. Offline via the mock; a live run reuses the M4 key.

What is simplified (and the production swap)

Per-process memory -> persist per user/session (M21 save_state).
Keyword search -> the M7 vector store for semantic retrieval.
Mock model -> live model, with deterministic eval subset in CI and live evals on a schedule (M26).
Estimated cost -> real response.usage (M25). The architecture does not change; only the backends.