Skip to content

M18 notes: Multi-agent orchestration (the one idea)

The one idea: instead of one agent trying to do everything, use a coordinator (orchestrator) that hands pieces of the job to focused specialist sub-agents, each of which can reach real systems through connectors. Same move a good team makes, and exactly what you deploy today.


1. Why not just one big agent?

A single agent with a giant prompt and ten tools can work, but it tends to:

  • lose focus: a prompt that says "triage AND enrich AND correlate AND write the report" pulls the model in four directions;
  • be hard to debug: when the output is wrong, which of the ten responsibilities failed?
  • be hard to improve: tweaking the report wording risks breaking triage.

Splitting into specialists fixes all three. Each sub-agent has one job and a short, sharp system prompt. You can test, swap, or upgrade one without touching the others. (This is the same instinct behind microservices, or behind a SOC team having an L1, an L2, and a lead.)

Analogy. A hospital ER doesn't have one doctor do triage, labs, imaging, and the discharge summary. A charge nurse (orchestrator) routes you to specialists who each do one thing well.

2. The three pieces

Piece What it is In our code
Orchestrator The coordinator. Decides who runs, in what order, and passes results along. orchestrator.pyinvestigate(alert)
Sub-agents Specialist LLMs, one focused role each. agents.pytriage, enrich, correlate, report
Connectors Functions that talk to outside systems (intel feed, log store). connectors.pylookup_ioc, search_logs, extract_indicators

A connector is just "a function that talks to a system." Today they hold synthetic data, but each is the shape of thing you'd expose as an MCP server (M16) so any agent or app could call it. The orchestrator "initiates connectors" by having its sub-agents call them.

3. The pattern we build (a SOC investigation team)

        alert ──▶ ORCHESTRATOR ─────────────────────────────────────────┐
                      │                                                  │
                      ├─▶ triage   (L1)   ── extract_indicators()        │
                      │      └─ severity + indicators                    │
                      ├─▶ enrich   (intel) ── lookup_ioc()  ◀── connector│
                      │      └─ which indicators are dangerous           │
                      ├─▶ correlate (L2)  ── search_logs()  ◀── connector│
                      │      └─ what the actor actually did              │
                      └─▶ report   (lead) ── synthesizes all of the above│
                                                                         ▼
                                                            INCIDENT REPORT

Each arrow is one sub-agent. Each sub-agent's output feeds the next, that hand-off is the orchestration. Two of them (enrich, correlate) reach the world through connectors.

This is sequential orchestration (a pipeline). Other shapes you'll meet:

  • Router / dispatcher: orchestrator picks which sub-agent to call based on the input (the optional challenge).
  • Parallel fan-out + gather: run several sub-agents at once, then merge (e.g. ask 3 specialists, combine).
  • Manager / hierarchical: a sub-agent can itself be an orchestrator of its sub-agents.

The frameworks in M19 (LangGraph, CrewAI, AutoGen, n8n, …) are mostly different ergonomics for these same shapes. You're building the pattern by hand first so the frameworks make sense.

4. The "agentic deploy"

The orchestrator is plain Python, so we deploy it the way we deployed any service in M11: wrap it in FastAPI (app.py), expose POST /investigate, and ship. Now anything, a SIEM, a Slack bot, a cron job, can hand it an alert and get a report. That's the one deployable agentic system the course is built around; M19 then shows how many ways there are to build the agents inside it.

5. When multi-agent is worth it (and when it isn't)

Use it when the task has genuinely distinct sub-jobs, each benefiting from a focused role or its own tools; when you want to test/upgrade parts independently; when different steps need different models (e.g. cheap claude-haiku-4-5 for triage, claude-opus-4-8 for the report).

Skip it when one prompt does the job, multi-agent multiplies cost (every sub-agent is its own LLM call) and latency (a 4-agent pipeline is ~4 sequential calls). Start with one agent; split only when one agent visibly strains.

6. Risks (read this twice: security module)

  • Cost & loops. Each sub-agent is a billed call; an orchestrator that retries or recurses can multiply that fast. Cap steps; log every call (our app.py logs latency + indicator count).
  • Compounding errors. A wrong triage misleads every downstream agent. Validate hand-offs (here: indicators come from a deterministic regex, not the model's imagination).
  • Oversight / human-in-the-loop. Our agents investigate and recommend, they never act (no blocking IPs, no disabling accounts). Action belongs to a human (M10, M14). An agent that can act needs an approval gate.
  • Authorized & synthetic only. All data here is fake. Pointing real connectors at real systems is a real-world authorization decision, not a lab step.

Words you'll hear

Orchestrator / coordinator, sub-agent / specialist agent, connector, hand-off, pipeline vs. router vs. fan-out vs. hierarchical, human-in-the-loop, SOC / triage / enrich / correlate / IOC (indicator of compromise). Full definitions in the glossary.