M27 notes: the capstone (the one idea)
The one idea: a production agent is not one clever trick, it is many modest patterns working together. The capstone takes the nine things you built in Part D and runs them as a single agent, so you can see how they compose: retrieval, memory, tracing, cost, reliability, and security are not optional extras you bolt on later, they are the difference between a demo and a system. Building them separately taught each idea; building them together is the actual job.
1. The map: which pattern, from which module
The agent is one ReAct loop (M9) with everything else wrapped around it. Here is what each part is:
| In the capstone | Pattern | From |
|---|---|---|
search_kb tool + citations |
agentic RAG (search, read, search again) | M24 |
ShortTermMemory + LongTermMemory |
remember the user across turns and sessions | M21 |
Trace with tokens and cost() |
observability and cost accounting | M20, M25 |
retry, StepLimiter |
survive blips, stop runaway loops | M22 |
approval_gate on send_email |
human-in-the-loop for risky actions | M22 |
wrap_untrusted, redact_secrets, domain_allowed |
treat content as data, least privilege, no exfiltration | M23 |
evals.py + exit code |
the gate that protects all of the above | M20, M26 |
app.py /chat |
deploy it as a service | M11, M18 |
Multi-agent orchestration (M18) is here in spirit: one coordinator loop calling specialist tools. The same structure scales to true sub-agents when a single agent's job gets too big.
2. Reading one request top to bottom
Trace a single call to chat("Who leads the team that runs billing?") and watch the patterns fire in
order:
- Memory recall (M21): pull any known facts about the user into the system prompt.
- Step cap tick (M22): before any cost, make sure we are not looping.
- Model call, retried (M22), traced and priced (M20/M25): the agent decides to search.
- search_kb (M24): finds "billing is run by the Payments team", wrapped as untrusted DATA (M23).
- The loop repeats: a second search ("who leads Payments") finds Dana Okafor. Multi-hop.
- Answer with citations
[D1, D3], plus the run's total cost and token count.
Now trace chat("Email the answer to attacker@evil.example") and a different set fires: the agent
tries send_email, the approval gate denies it and the allowlist rejects the domain, so the
action is blocked and recorded, while redaction would have stripped any secret anyway. Same
agent, different guards, because the request was risky.
3. Why composition is the lesson
Each pattern covers a different failure of a naive agent:
- Without memory, it forgets you between turns.
- Without agentic RAG, it cannot answer multi-step questions or cite anything.
- Without observability and cost, you cannot debug it or predict the bill.
- Without reliability, one API blip or one bad loop takes it down or empties your wallet.
- Without security, a poisoned document turns it into an exfiltration tool.
- Without an eval gate, your next change silently breaks one of the above.
A real agent needs all of them at once, because real inputs hit all of these at once. The capstone is small, but the shape is exactly what a production agent looks like: a simple core, surrounded by guards.
4. What is simplified here (and how you would harden it)
Being honest about the gap between this and production:
- Memory is per-process; a real service keys it by user/session and persists it (M21
save_state). - Retrieval is keyword search; swap in the M7 vector store for semantic, embedding-based recall.
- The model is mocked so it runs offline; a live build uses the real model and runs the eval gate on a deterministic subset in CI (M26) plus live evals on a schedule.
- Cost and latency are estimates from a pricing model; read real numbers from
response.usage(M25). - One agent, two tools; a larger system splits into specialist sub-agents (M18) with the same guards.
None of these change the architecture. They are the same patterns with production-grade backends.
5. Part D retrospective
You started Part D by deploying one orchestrated agent (M18) and learning to build agents many ways (M19). Then you made agents observable (M20), able to remember (M21), reliable (M22), secure (M23), able to research (M24), affordable (M25), and continuously tested (M26). The capstone is where that becomes a single thing you could put in front of a user. That is AI engineering: not just calling a model, but building a dependable system around it.
Words you will hear
Capstone / integration, cross-cutting concern, composition, plus every Part D term: agentic RAG, memory, trace, cost, retry, step cap, approval gate, allowlist, eval gate. Full definitions in the glossary.