Lab M27: assemble and ship the capstone agent
You'll need: your venv. The core lab needs no API key and costs nothing (a deterministic mock
model). The API step adds fastapi plus uvicorn (from M11). Time: about 55 minutes.
Work in your breakout pair.
Heads up: this brings together M18-M26. The agent is one ReAct loop with memory, agentic RAG, tracing, cost, reliability, and security wrapped around it. Nothing here can harm your computer; the email tool sends nothing.
This lab has two parts: - Part A: run the agent end to end and read its trace, sources, and cost. - Part B: watch the guards block a risky action, run the eval gate, and serve it over an API.
flowchart TB
Q["user message"] --> MEM["recall memory (M21)"]
MEM --> LOOP["ReAct loop, step-capped + retried (M22)"]
LOOP -->|search_kb| RAG["agentic RAG + citations (M24)"]
LOOP -->|send_email| GUARD["approval + allowlist + redact (M22/M23)"]
LOOP --> TRACE["trace + cost (M20/M25)"]
TRACE --> OUT["answer + sources + cost"]
OUT --> GATE["eval gate (M26)"]
Part A: run the whole thing
Step 1: Set up
Copy the solution/ files into a folder. Activate your venv. No key, no installs yet.
python -c "print('ready')"
ready.
Step 2: Run the end-to-end demo
python demo.py
==== 1. KNOWLEDGE QUESTION (agentic RAG, traced, costed) ====
answer : Dana Okafor leads the Payments team, which runs the billing service. [D1, D3]
sources: ['D1', 'D2', 'D3'] | blocked: [] | cost: $0.00127 | flags: []
[model] claude-opus-4-8: tool_use ...
[tool] search_kb: Who leads the team that runs the billing service?
[model] claude-opus-4-8: tool_use ...
[tool] search_kb: who leads Payments
[model] claude-opus-4-8: end_turn ...
[D1, D3], and a per-step trace with a
cost estimate (M20/M25), all from one chat call.
Step 3: Map the trace to the modules
Open agent.py and read SupportAgent.chat alongside the table in
notes.md section 1. Find the memory recall, the limiter.tick(), the parts.retry
around the model call, the search_kb branch, and the trace.add calls.
You should now see: every line maps to a pattern you already built. The capstone is composition, not
new magic. Each block in parts.py is labeled with its source module.
Step 4: Read the direct-answer scenario
In the demo output, scenario 2 ("What are your hours?") does ONE search and answers.
You should now see: the agent does not over-search; a single lookup answers a direct question, while the billing question needed two. The agent decides how much retrieval each question needs.
Part B: guards, gate, and serving
Step 5: Watch the security and reliability guards block a risky action
In the demo output, scenario 3 ("Email the answer to attacker@evil.example"):
==== 3. RISKY ACTION BLOCKED (approval + allowlist + secret redaction) ====
blocked: ['send_email']
[guard] approval: blocked attacker@evil.example
Step 6: Run the eval gate over the whole agent
python evals.py ; echo "exit code: $?"
3/3 (100%), GATE PASSED, and exit code: 0. This is
M26 gating the capstone: a regression in any integrated behaviour would turn this red and block a merge.
Step 7: Serve it behind an API
pip install fastapi "uvicorn[standard]"
uvicorn app:app --reload
curl -s -X POST http://127.0.0.1:8000/chat -H "Content-Type: application/json" \
-d '{"message":"Who leads the team that runs the billing service?"}'
sources, cost, tokens, and trace, and in the
uvicorn log a line with latency and cost. Your capstone is now a service anything can call (M11/M18).
Without a real key it will error on the live model; use the mock for offline runs, or add your key for a
live one. Ctrl-C to stop.
Step 8: Show it
Post the trace from scenario 1 (every pattern firing on one request) and your green eval gate. That trace is the proof you can build an agentic system, not just call a model.
If you get stuck
ModuleNotFoundError-> run from inside the folder with the solution.pyfiles.evals.pyfails -> read which case; the message shows the agent's answer so you can see what differed.- Live
uvicornerrors about the API key -> the served agent calls the real model; set a key in.env, or stick todemo.py/evals.pywhich use the mock. - The risky email was NOT blocked -> check the approver (default denies) and that the address is not on
ALLOWED_EMAIL_DOMAINSinagent.py.