Skip to content

AI Engineer: Course Roadmap

A step-by-step path to becoming an AI engineer in 2026, built as you work through this course. Inspired by the format of roadmap.sh/ai-engineer, but every node here maps to a module you actually build.


What is an AI Engineer?

An AI engineer builds applications on top of AI models, chatbots, knowledge assistants, agents, using models that already exist. This is different from two roles people often confuse it with:

  • An AI/ML researcher invents new models and trains them from scratch (heavy maths, GPUs, papers).
  • An ML engineer trains, tunes, and serves custom models on a company's own data.

An AI engineer mostly calls hosted models through an API and engineers everything around them, the prompts, the data (RAG), the tools (agents), the guardrails, and the deployment. You do not need a maths PhD or your own GPUs. You need to be able to write Python, call a model, shape its input and output, give it your data and tools, test it, and ship it: which is exactly the arc of this course.

The honest version: the hard part of AI engineering isn't the AI, it's the engineering. That's why this course spends its first third making you genuinely comfortable with Python before a single API call. Take that part seriously and the rest follows.


How to read this roadmap

Each box is a topic. Colour tells you how to treat it:

Style Meaning
Core Do it. The main path, every learner needs this.
Stretch / alternative Optional depth or a "pick-one-of-many" choice. Skip on a first pass; come back when curious.
Tooling (install only when needed) A tool you set up at that module, never before. We walk through each install in detail.

Order: Part A (M1→M2→M3) is strictly sequential: each builds on the last. In Part B, the modules are best done in order too, but the stretch topics (extra frameworks, multimodal, local models) can be skipped or reordered freely.


The whole path at a glance

flowchart TD
  START(["Start here:<br/>can use a browser, no coding"]):::start
  M0["M0 · AI Engineering, explained<br/>(what it is · how LLMs work)"]:::core
  subgraph A["Part A, Python foundation"]
    direction TB
    M1["M1 · Your first Python program"]:::core
    M2["M2 · Logic & data (decisions, lists, dicts)"]:::core
    M3["M3 · Functions, files, libraries & errors"]:::core
    M1 --> M2 --> M3
  end
  subgraph B["Part B, Building with AI"]
    direction TB
    M4["M4 · Ship your first AI app"]:::core
    M5["M5 · Prompt engineering"]:::core
    M6["M6 · Driving the model from code"]:::core
    M7["M7 · RAG I, your own knowledge"]:::core
    M8["M8 · RAG II, make it good"]:::core
    M9["M9 · Agents (tools, frameworks) ·~2 sessions"]:::core
    M10["M10 · Evaluation, guardrails & security"]:::core
    M11["M11 · Deployment & productionizing"]:::core
    M4 --> M5 --> M6 --> M7 --> M8 --> M9 --> M10 --> M11
  end
  CAP{{"Capstone · design, build & demo<br/>a complete AI app of your choice"}}:::cap
  START --> M0 --> M1
  M3 --> M4
  M11 --> CAP

  classDef start fill:#D6E8FF,stroke:#3B82C4,color:#123;
  classDef core fill:#FFE08A,stroke:#E6A700,color:#222;
  classDef cap fill:#B7E1A1,stroke:#3FA34D,color:#143;

Orientation

M0: AI Engineering, explained · Core · open module

Win: explain what AI engineering is and how an LLM works; compare two models yourself. - [ ] What AI engineering is, vs ML engineer / data scientist / researcher; AI vs AGI - [ ] How LLMs work: next-token prediction, tokens, context window, training vs inference, hallucination - [ ] The model landscape (closed vs open, the big families) and how to choose a model - [ ] How the toolkit fits: prompting → RAG → agents → fine-tuning; a first look at responsible AI - [ ] No install, a browser-only "meet the models" comparison

More breadth is on the way (planned): Multimodal (images/audio), Open-source & local models (Hugging Face / Ollama), and a broadened Ethics & Responsible AI module.


Part A: Python foundation

Build the tool before you build with it. Browser-only (Colab) through M2; install Python locally at M3.

flowchart LR
  M1["M1 · First program"]:::core --> M2["M2 · Logic & data"]:::core --> M3["M3 · Functions, files, libraries"]:::core

  M1 --- t1["print · variables<br/>types · input · f-strings"]:::core
  M1 --- t1b["Google Colab<br/>(browser notebook)"]:::tool

  M2 --- t2["if / elif / else · comparisons<br/>for & while loops"]:::core
  M2 --- t2b["lists · dictionaries<br/>(the shape APIs speak)"]:::core

  M3 --- t3["functions · imports<br/>read/write files · JSON · try/except"]:::core
  M3 --- t3b["Python + virtualenv + pip<br/>(first local install)"]:::tool
  M3 --- t3c["PyTorch, CPU vs GPU<br/>(only if running models locally)"]:::stretch

  classDef core fill:#FFE08A,stroke:#E6A700,color:#222;
  classDef stretch fill:#E6D7FF,stroke:#9B6DFF,color:#222;
  classDef tool fill:#ECECEC,stroke:#999,color:#222,stroke-dasharray:4 3;

M1: Your first Python program · Core · open module

Win: write & run your own Python program. - [ ] What a program is; running Python in Colab (no install) - [ ] print (output) and comments - [ ] Variables and the three basic types (text / number / true-false) - [ ] input and f-strings → a personalized tip helper

M2: Logic & data · Core · open module

Win: a program that decides and processes a list. - [ ] Comparisons (==, >, …) and if / elif / else - [ ] for and while loops - [ ] Lists (ordered) and dictionaries (labelled, the shape of API/JSON data) - [ ] Build: a budget categorizer over a list of dictionaries

M3: Functions, files, libraries & errors · Core · open module

Win: organized, reusable code that reads/writes data and survives mistakes. - [ ] Functions and import; pip & libraries - [ ] First local install: Python + a virtual environment (OS-aware, detailed) - [ ] Read/write files; JSON (the text form of M2's dictionaries); try / except - [ ] Optional tooling box: PyTorch CPU-only vs GPU/CUDA, only if you'll run models locally; the rest of the course uses hosted APIs.


Part B: Building with AI

flowchart LR
  M4["M4 · First AI app"]:::core --> M5["M5 · Prompting"]:::core --> M6["M6 · API in code"]:::core --> M7["M7 · RAG I"]:::core --> M8["M8 · RAG II"]:::core --> M9["M9 · Agents"]:::core --> M10["M10 · Eval & security"]:::core --> M11["M11 · Deploy"]:::core

  M4 --- t4["what an LLM is (builder's view)<br/>request → response"]:::core
  M4 --- t4b["API key in .env<br/>(secrets hygiene)"]:::tool

  M5 --- t5["system vs user prompts · few-shot<br/>chain-of-thought · structured output"]:::core

  M6 --- t6["messages API · temperature/max tokens<br/>streaming · parse JSON output"]:::core

  M7 --- t7["embeddings · chunking · retrieval"]:::core
  M7 --- t7b["vector store: Chroma / FAISS<br/>(install & configure)"]:::tool

  M8 --- t8["retrieval quality · reranking<br/>a simple eval set"]:::core

  M9 --- t9["function calling from first principles<br/>ReAct loop · memory"]:::core
  M9 --- t9b["one framework deep:<br/>LangGraph or CrewAI"]:::core
  M9 --- t9c["survey: AutoGen · OpenAI/Claude SDKs ·<br/>LlamaIndex · smolagents · Hermes · MCP"]:::stretch

  M10 --- t10["OWASP LLM Top 10 · prompt injection<br/>excessive agency · red-team → guardrails"]:::core

  M11 --- t11["FastAPI · cost/latency · monitoring"]:::core
  M11 --- t11b["Docker container<br/>(callback to Course 01)"]:::tool

  classDef core fill:#FFE08A,stroke:#E6A700,color:#222;
  classDef stretch fill:#E6D7FF,stroke:#9B6DFF,color:#222;
  classDef tool fill:#ECECEC,stroke:#999,color:#222,stroke-dasharray:4 3;

M4: Ship your first AI app · Core · open module

Win: a working chatbot you understand line by line. - [ ] What an LLM is from a builder's view; request → response - [ ] API key setup in full: create account → make a key → store in .env → load it → tiny test call (never commit keys) - [ ] A minimal chatbot with a personality

M5: Prompt engineering · Core · open module

Win: reliably get a model to do what you want. - [ ] System vs user prompts; few-shot; chain-of-thought; structured output - [ ] A/B two prompts on a tool from your own life/work and see the difference

M6: Driving the model from code · Core · open module

Win: use the API fluently from real programs. - [ ] The messages API; parameters (temperature, max tokens); streaming - [ ] Parse a structured JSON response into something useful (builds on M3's JSON)

M7: RAG I: give the AI your own knowledge · Core · open module

Win: an assistant that answers questions about your documents. - [ ] Why models don't know your data; embeddings; chunking; vector search - [ ] Install & configure a vector store (Chroma or FAISS) - [ ] Q&A over a document you choose

M8: RAG II: make it good · Core · open module

Win: tell whether your RAG app is correct, and improve it. - [ ] Retrieval quality; chunk-size tradeoffs; reranking - [ ] Run a simple eval set (does the answer match the source?)

M9: Agents: tools, function calling & frameworks · Core (≈2 sessions) · open module

Win: an AI that takes actions, not just talks. - [ ] Function calling from first principles: schema → model picks a tool → you run it → return the result - [ ] One framework, deeply (LangGraph or CrewAI); add memory - [ ] Survey only (don't drown): AutoGen/AG2, OpenAI Agents SDK, Claude Agent SDK, LlamaIndex, smolagents, Hermes, MCP - [ ] Headline project: a SOC L1/L2 security assistant on sample/synthetic data only, plus a non-security agent

M10: Evaluation, guardrails & security · Core · open module

Win: test your app and stop it being tricked or misused. - [ ] Evaluating LLM apps; OWASP LLM Top 10; prompt injection; excessive agency - [ ] Red-team your own app, then add a guardrail layer and re-test (authorized/educational only)

M11: Deployment & productionizing · Core · open module

Win: your app runs for real, not just on your laptop. - [ ] Wrap it in FastAPI; config via .env; basic monitoring; cost/latency - [ ] Containerize with Docker (callback to Course 01)

Capstone · Build & demo

Design, build, and demo a complete AI app of your choice, and explain how it works and how you'd secure it, in your own Python. Pick a track: - Knowledge assistant (RAG): Q&A over a real document set, evaluated and deployed. - Action agent: a tool-using agent that completes a multi-step task, with guardrails. - Your idea: anything LLM-powered that solves a real problem from your life or work.

Requirements: it runs, it handles a basic failure gracefully, and you can explain it.


Part C: Breadth & responsibility (extend the core path)

These widen the course toward the full AI-engineer landscape. They're not required to ship the capstone, take them when relevant.

M12: Multimodal AI · Core-ish · open module (best after M6)

Win: an app that understands an image you give it. - [ ] Vision / image understanding: image + question → answer (an image content block) - [ ] Read text from images (OCR-style); extract fields from a photo (+ M6 structured output) - [ ] Survey: image generation (DALL·E / Stable Diffusion), speech-to-text (Whisper), TTS, video

M13: Open-source & local models · Breadth · open module

Win: run an open model locally, free, offline, private. - [ ] Closed vs open / local vs hosted trade-offs (capability, cost, speed, privacy) - [ ] Ollama: pull a small model, chat in the terminal, call its local API from Python (no key!) - [ ] Survey: Hugging Face (the model hub), LM Studio; quantization (why small models fit a laptop)

M14: Ethics & Responsible AI · Breadth · open module

Win: test for bias, protect privacy, keep humans in the loop. - [ ] Fairness probe: same task, swap a sensitive attribute, compare (find unfair treatment) - [ ] Privacy: redact PII before sending; data minimization (no key needed) - [ ] Transparency, hallucination honesty, content moderation, human-in-the-loop, accountability - [ ] Frameworks to know: EU AI Act, NIST AI RMF, ISO/IEC 42001

M15: Fine-tuning & training · Breadth · open module

Win: build a fine-tuning dataset and know when (and how) to fine-tune. - [ ] How training works: neural nets → transformers → pre-training → fine-tuning (SFT) → RLHF - [ ] Build & validate a chat-format JSONL dataset (the part that decides quality) - [ ] Submit a fine-tune (hosted API) or local LoRA; evaluate it; overfitting/forgetting risks - [ ] The rule: prompt → RAG → fine-tune (fine-tune for behaviour, RAG for facts)

M16: Building MCP servers & clients · Breadth · open module (best after M9)

Win: expose your tools over MCP so any AI app can use them. - [ ] MCP = a standard ("USB-C for AI tools"); server (exposes tools) vs client (consumes them) - [ ] Build a FastMCP server (@mcp.tool()) and a client (discover → call), runs locally, no key - [ ] Connect your server to a real app (Claude Desktop); MCP security (tool poisoning, least privilege)

M17: Build a language model from scratch · Optional deep-dive · open module

Win: train a tiny LM from scratch and meet the transformer, so LLMs aren't magic. - [ ] The training loop: predict next token → loss → nudge weights → repeat → generate (tiny numpy model) - [ ] A real transformer in miniature (PyTorch): embeddings, self-attention, blocks - [ ] How scale + data + compute + RLHF (M15) turn this into an LLM - [ ] Note: a "researcher" lab for understanding, not the AI engineer's day job (building with models is).


Part D: Agentic Systems (the capstone track)

Everything so far comes together here: you deploy one orchestrated multi-agent system, then learn to build agents across many frameworks. This is the "agentic deploy that initiates connectors and coordinates security sub-agents" the course is built toward.

M18: Multi-agent orchestration · Flagship · open module (builds on M9 + M16 + M11)

Win: a deployed orchestrator coordinating a team of security sub-agents through connectors. - [ ] Orchestrator vs. sub-agents: one coordinator hands focused jobs to specialists (a SOC team) - [ ] Build the pipeline: triage → enrich → correlate → report, each a sub-agent passing results on - [ ] Connectors: sub-agents reach a (synthetic) threat-intel feed + log store, the shape of MCP servers (M16) - [ ] The agentic deploy: wrap the orchestrator in FastAPI POST /investigate (M11) and ship it - [ ] When multi-agent beats one agent; the risks (cost, loops, compounding errors, human-in-the-loop) - [ ] Educational, authorized, synthetic data only; agents investigate & recommend, they never act

M19: Build agents in many frameworks · Flagship breadth · open module

Win: build the SAME agent across the major frameworks, so you can pick the right tool per job. - [ ] From scratch (hand-rolled ReAct loop) → why frameworks exist (verified: real loop runs) - [ ] LangChain / LangGraph (verified: graph builds), CrewAI, AutoGen, OpenAI Agents SDK, Claude Agent SDK, smolagents, LlamaIndex (doc-grounded reference; run as pilots) - [ ] No-code/low-code: n8n workflow agent (importable workflow, connect tools without writing the loop) - [ ] The model-string gotcha (bare id vs. LiteLLM anthropic/…); how they map onto M18's orchestration shapes

M20: Agent observability and evaluation · Flagship · open module (builds on M9 + M19 + M10)

Win: a tracer that shows every step an agent took, plus an eval scorecard that turns red the instant the agent regresses. - [ ] Observability: record each model call and tool call as spans in a trace (inputs, outputs, tokens, latency, errors) (verified: real loop, mock model) - [ ] Evaluation: a golden test set plus rule-based scorers that check the answer AND the trace (verified: 100% on a correct agent, 0% on a broken one) - [ ] Catch a regression: break the tool on purpose and watch the suite fail and name the failing check - [ ] LLM-as-judge for open-ended answers (pilot); the production tools (LangSmith, Langfuse, Arize Phoenix, OpenTelemetry) - [ ] What to monitor in production: cost, latency, error rate, tool usage, plus human-in-the-loop review (M14)

M21: Agent memory and state · Flagship · open module (builds on M7 + M9 + M20)

Win: an agent that remembers you within a chat, recalls facts in a brand new session, and can pause and resume. - [ ] Memory is prompt construction: the model stores nothing; you choose what to feed back in - [ ] Short-term memory: the conversation kept under a token budget (drop-oldest, then summarize-oldest) (verified offline) - [ ] Long-term memory: facts saved to disk and recalled by relevance across sessions (verified offline) - [ ] Keyword recall here vs embeddings in a vector store (M7) for true semantic recall; same remember/recall shape - [ ] Checkpointing: save and resume the whole state (the role of LangGraph checkpointers) (verified offline)

M22: Agent reliability and ops · Flagship · open module (builds on M11 + M18 + M20)

Win: an agent that recovers from a flaky API, refuses to loop, fails safely in an outage, and gates risky actions behind a human. - [ ] Retry with exponential backoff on transient errors only (not permanent ones) (verified offline) - [ ] Timeout: give up on a hung call so one slow dependency cannot freeze the agent (verified offline) - [ ] Fallback / graceful degradation: return a safe message instead of crashing during an outage (verified offline) - [ ] Step cap: stop a runaway loop before it burns tokens (pairs with M20 cost visibility) (verified offline) - [ ] Human-approval gate for risky, world-changing actions (human-in-the-loop, M14) (verified offline) - [ ] The challenge: a circuit breaker that stops calling a dead service entirely

M23: Agent security · Flagship · open module (builds on M10 + M16 + M22)

Win: the same prompt-injection attack leaks a secret from a vulnerable agent and is blocked at multiple layers by a hardened one. Synthetic, defensive, offline. - [ ] Indirect prompt injection: instructions hidden in content the agent reads (the main agentic threat) (verified offline) - [ ] Excessive agency and data exfiltration: how an over-powered tool turns a hijack into real harm (verified offline) - [ ] Defenses: treat content as data, detect injection, least privilege / allowlists, redact secrets, approval gate (M22) - [ ] Defense in depth: the model is fooled yet the secret is contained, because the tool is locked down (verified offline) - [ ] The OWASP Top 10 for LLM Applications as a shipping checklist; educational and authorized-use only

M24: Agentic RAG and research agents · Flagship · open module (builds on M7-M8 + M9)

Win: a research agent that answers a multi-hop question one-shot RAG cannot, by searching more than once and citing its sources. - [ ] Where one-shot RAG (M7-M8) stops: multi-hop questions, bad first queries, over-retrieval (verified offline) - [ ] Retrieval as a TOOL inside the M9 loop: search, read, refine the query, search again (verified offline) - [ ] Answer with citations to the documents used; decide NOT to search when it would not help (verified offline) - [ ] Keyword search here vs embeddings in a vector store (M7) behind the same search tool - [ ] Cost and guards: each search is another model call (M20); cap searches (M22)

M25: Cost and performance optimization · Flagship · open module (builds on M20 + M6)

Win: the same agent pipeline taken from 669 dollars per 10,000 runs down to about 322 and faster, with the trade-offs measured offline. - [ ] Estimate dollars and latency from token counts (a pricing model, no spend) (verified offline) - [ ] Prompt caching: pay once for a stable prefix, read it cheaply after (about half the cost) (verified offline) - [ ] Model routing: cheap fast model for easy steps, strong for hard; cuts cost AND latency (verified offline) - [ ] Token trimming, and why the hard steps' output tokens are the cost you cannot cheaply cut - [ ] Every optimization is a bet on quality: re-run the M20 evals to confirm it held; streaming and the Batch API

M26: Evaluation-driven development and CI · Flagship · open module (builds on M20 + M11)

Win: an eval gate that runs on every push, passes a good change, and blocks a regression automatically. - [ ] A versioned eval set in the repo that grows over time (every bug becomes a test) (verified offline) - [ ] An eval gate as an EXIT CODE: 0 passes the build, non-zero blocks the merge (verified offline) - [ ] A GitHub Actions workflow that runs the gate on every push and pull request (real sample workflow) - [ ] What runs in CI (deterministic, mock/recorded, free) vs on a schedule (live, key as a secret) - [ ] Track quality over time; choose a threshold; keep CI deterministic so red always means a real problem

M27: Part D capstone, ship a complete agent · Flagship finale · open module (integrates M18-M26)

Win: one deployable support agent that uses every Part D pattern at once, with a green eval gate over it. - [ ] Assemble agentic RAG (M24) + memory (M21) + the M9 tool loop into a single agent (verified offline) - [ ] Wrap it with tracing + cost (M20/M25), reliability (M22), and security guards (M22/M23) (verified offline) - [ ] Serve it behind a FastAPI /chat endpoint (M11/M18) (verified via TestClient) - [ ] Gate the whole agent with an eval suite that exits non-zero on a regression (M20/M26) (verified: 3/3, exit 0) - [ ] Know what is simplified vs production (persisted memory, vector store, live model, real usage) and how to harden it

M28: Agent UX and streaming to a UI · Flagship · open module (builds on M6 + M24 + M11)

Win: an agent that streams its progress and answer live, shows sources and cost, can be cancelled, and serves over SSE. - [ ] Perceived latency: why streaming the answer (time-to-first-token) is the biggest UX win (verified offline) - [ ] The agent as an event stream (generator): status, tool, citation, token, cost, done (verified offline) - [ ] Surface citations (M24) and cost (M20/M25) in the UI; stream the answer token by token (M6) - [ ] Cancellation for free: stop iterating, the agent stops, no more cost (verified offline) - [ ] Serve it over Server-Sent Events with FastAPI (M11/M18) (verified via TestClient)

M29: Agent deployment and serving · Flagship · open module (builds on M11 + M21)

Win: a production-shaped agent service with env config, health/readiness probes, graceful lifecycle, statelessness, and a real container. - [ ] Config from the environment (12-factor); secrets from the env, never the code; fail fast on bad config (verified offline) - [ ] Liveness (/healthz) vs readiness (/readyz) probes, and why they differ (verified offline) - [ ] Graceful startup and shutdown (warm up, then drain) via lifespan (verified offline) - [ ] Statelessness so you can run many replicas behind a load balancer (the M21 caution applied) (verified offline) - [ ] A production Dockerfile: slim base, non-root user, healthcheck, pinned deps; workers vs async concurrency

M30: Agent data and feedback loops · Flagship · open module (builds on M20/M26 + M15 + M14)

Win: a feedback pipeline that turns a real down-vote into both a regression eval case and a corrected training example, PII redacted. - [ ] Capture interactions + feedback, redacting PII at write time (privacy first, M14) (verified offline) - [ ] The three signals: thumbs up (golden), down+correction (regression + fix), down-no-fix (human review) (verified offline) - [ ] Curate into eval cases (M26): up -> golden, down+correction -> regression, deduped (verified offline) - [ ] Curate into fine-tuning examples (M15): chat-format, learn from good and corrected answers (verified offline) - [ ] Close the loop on a cadence; curation is judgement (dedupe/filter/balance/review); beware feedback bias and amplification


Part D extended → Part E: Operations Support (safeguard the running system)

Operations support is the layer wrapped around everything above. AI engineering is the backbone; operations support safeguards the architecture, the databases, and the builds, keeping what you shipped running, supported, and recoverable. Built additively on Part D; every lab runs offline.

Orientation: Operations Support, explained — what the role is, the LLMOps/AgentOps/AIOps lenses, the deploy → observe → respond → improve loop, and a day in the role. Read it before M31.

M31: Incident response and on-call · Flagship · open module (builds on M20 + M22 + M26)

Win: an on-call drill where an outage burns an error budget, pages you, is mitigated by a runbook, and ends in a blameless postmortem and a regression test. - [ ] Define "healthy" as a number: SLO, SLI, error budget (verified offline) - [ ] Burn rate + two-window alerting: when to page vs open a ticket (verified offline) - [ ] The incident lifecycle: detect → triage → mitigate → resolve, on a timeline (verified offline) - [ ] Runbooks of safe, reversible mitigations; stop as soon as it is healthy (verified offline) - [ ] Blameless postmortem → regression eval (M26); the challenge: an escalation ladder

M32: AI support desk and AIOps · Flagship · open module (builds on M9/M19 + M22 + M30)

Win: a desk that triages tickets by severity and confidence, routes them under an SLA, escalates breaches, sends the uncertain to a human, and collapses a 40-alert storm into a few incidents. - [ ] Triage = classify + confidence; the confidence gate hands uncertain cases to a human (verified offline) - [ ] Route by severity to a tier (L1/L2/L3) under an SLA; escalate on breach (verified offline) - [ ] AIOps: correlate an alert storm into incidents (page on causes, not symptoms) (verified offline) - [ ] Each correlated incident is exactly what M31 opens and fixes; challenge: an SLA-urgency queue

M33: Data and release operations · Flagship · open module (builds on M7 + M11/M26/M29 + M14)

Win: an index that redacts PII, re-embeds only what changed, sweeps expired data, and restores from a backup; plus a canary that promotes a good release and rolls back a bad one, with zero-downtime secret rotation. - [ ] Safeguard the database: PII redaction on write, staleness → selective reindex, retention (verified offline) - [ ] Backups you can actually restore (snapshot/restore) (verified offline) - [ ] Safeguard the build: canary vs the live baseline on an eval set → promote or rollback (verified offline) - [ ] Secret rotation with a grace window (zero downtime); challenge: a progressive canary ramp

M34: Part E capstone, the on-call shift · Flagship finale · open module (integrates M31-M33 + M26 + M27)

Win: one on-call drill where M31 + M32 + M33 handle a single incident end to end, with an eval gate scoring the whole shift. - [ ] Intake: an alert storm correlates into incidents; a user ticket is triaged and linked (M32) (verified offline) - [ ] Detect: the burning error budget pages sev1 (M31) (verified offline) - [ ] Mitigate: the runbook rolls back the bad release; the SLI recovers 80% → 100% (M31+M33) (verified offline) - [ ] Learn: a blameless postmortem becomes a regression eval (M31 → M26) (verified offline) - [ ] Ship the fix behind a canary; an eval gate scores the shift and exits non-zero on a regression (verified: 7/7, exit 0)

M35: Operations Support, going deeper · Optional go-deeper · open module (extends M20/M25/M26/M29/M30)

Win: five small, runnable operations tools, each extending a module you already built, all verified offline. - [ ] Structured, correlated logs you can query for one request's whole story (M20) (verified offline) - [ ] The four golden signals + SLO burn rendered into a dashboard with breach flags (M20/M31) (verified offline) - [ ] Online evaluation on sampled live traffic that catches drift the offline gate missed (M26/M30) (verified offline) - [ ] A token-bucket rate limit, a per-tenant quota, and a concurrency limit (M25/M29) (verified offline) - [ ] The reliability flywheel: incidents become regression guards and repeats fall (M30/M31/M26) (verified offline)


Where to go deeper

Each AI module points outward to the AI Engineering Resource Map for learners who want more. High-value external paths (study these after the matching module):

  • Agents (M9): the Hugging Face Agents Course (incl. its agentic-RAG unit); the MCP spec.
  • RAG (M7-M8): DeepLearning.AI short courses on RAG and evaluation.
  • Security (M9-M10): the OWASP LLM Top 10, plus the security labs, red-team tools, and sample datasets listed in the Resource Map. (Educational, authorized-use only.)
  • Breadth: the "awesome generative AI" guide referenced in the Resource Map.

Track your progress: tick the [ ] boxes above as you finish each topic, that's your personal version of this roadmap. The course README links every module folder.