Skip to content

Lab M29: serve an agent like production

You'll need: your venv and fastapi plus uvicorn (from M11). The core lab needs no API key and costs nothing (TestClient, in-process). Docker is optional for the last step. Time: about 45 minutes. Work in your breakout pair.

Heads up: the agent here is a stub on purpose; this module is about the SERVING shell around it, config, probes, lifecycle, statelessness, and the container. Nothing here can harm your computer.

This lab has two parts: - Part A: config from the environment, and the two health probes. - Part B: readiness gating, statelessness, and the production Dockerfile.

flowchart LR
  LB["load balancer"] --> R1["replica 1"]
  LB --> R2["replica 2"]
  LB --> R3["replica 3"]
  R1 -. "/readyz 503 until warm" .-> LB
  STORE[("session store\n(state lives here)")] --- R1
  STORE --- R2
  STORE --- R3

Part A: config and probes

Step 1: Set up

Copy the solution/ files into a folder. Activate your venv.

pip install fastapi "uvicorn[standard]"
python -c "import fastapi; print('ready')"
You should now see: ready.

Step 2: Run the demo

python demo.py
You should now see config coming from the environment and being validated:
==== CONFIG comes from the environment ====
defaults: {... 'environment': 'development', 'max_steps': 6, 'api_key_set': False}
prod without key, validate(): ['ANTHROPIC_API_KEY is required in production']
custom env -> max_steps: 10 | log_level: DEBUG | key set: True (value never logged)
You should now see: changing an env var changes behaviour with no code change, production refuses to start without its key (fail fast), and the secret's value is never printed. Open config.py and read validate and redacted.

Step 3: Compare the two probes

In the demo output:

GET /healthz: 200 {'status': 'alive'}
GET /readyz (warmed): 200
Open app.py and read healthz and readyz.

You should now see: /healthz answers "is the process alive?" (fail = restart me), /readyz answers "should I get traffic yet?" (not-ready = hold traffic back). Two different questions, two different consequences.

Step 4: Set config via the environment yourself

AGENT_MAX_STEPS=12 LOG_LEVEL=debug python -c "import config; print(config.load().redacted())"
You should now see: max_steps is 12 and log_level is DEBUG, set entirely from the environment. Same code, different config. This is how one image runs in dev, staging, and prod.


Part B: readiness, statelessness, and the container

Step 5: Watch readiness gate traffic

In the demo output:

GET /healthz while not ready: 200 (still alive)
GET /readyz while not ready: 503 (keep traffic away)
POST /chat while not ready: 503
You should now see: while the app is not ready (starting up or draining), the process is still alive (/healthz 200) but /readyz and /chat return 503, so the load balancer keeps users off until it is warm. This is what prevents requests hitting a half-started replica during a deploy.

Step 6: See that the service is stateless

In the demo output:

r1: {'answer': 'You said: first',  'session_id': 'a'}
r2: {'answer': 'You said: second', 'session_id': 'b'}
independent (no memory bleed): True
You should now see: each request is independent; the server keeps no per-process session memory. That is what lets you run many replicas behind a load balancer (any replica can serve any request). Per-user memory lives OUTSIDE the process, keyed by session_id (the M21 store). Read the note in app.py on ChatIn.session_id.

Step 7: Read the production Dockerfile

Open Dockerfile. Find: a slim base, deps installed before code (layer caching), a non-root appuser, EXPOSE 8000, a HEALTHCHECK that hits /healthz, and pinned requirements.txt.

You should now see: each line is a habit that prevents a real incident (root containers, unpinned deps, no healthcheck). If you have Docker: docker build -t agent-service . then docker run -p 8000:8000 -e ANTHROPIC_API_KEY=sk-... agent-service. Without Docker, this step is read-only.

Step 8: Run it for real (optional)

uvicorn app:app --host 0.0.0.0 --port 8000 --workers 2
curl -s localhost:8000/healthz ; echo
curl -s localhost:8000/readyz ; echo
curl -s -X POST localhost:8000/chat -H 'Content-Type: application/json' -d '{"message":"hello"}'
You should now see: alive, ready, and a chat response, served by 2 worker processes (concurrency). Ctrl-C to stop.

Step 9: Show it

Post your /healthz and /readyz responses (warmed and not-ready), and one sentence explaining the difference between liveness and readiness in your own words.


If you get stuck

  • ModuleNotFoundError: fastapi -> pip install fastapi "uvicorn[standard]" (from M11).
  • /readyz is always 200 -> readiness flips to false only during startup/drain or when you set STATE["ready"]=False; the demo simulates it.
  • Where does the API key go? -> the environment (.env locally, the platform's secret store in prod), never the code. config.py reads it from there.
  • Docker step fails -> it is optional; the rest of the lab needs no Docker. Check Docker is installed and running.

Check yourself

What is the difference between a liveness and a readiness probe? Liveness asks "is the process alive?" (fail = restart the container). Readiness asks "should this instance receive traffic now?" (not ready = hold traffic back during startup or draining). Different questions, different actions.
Why must a service be stateless to scale horizontally? Because requests are spread across many replicas and any one can serve any request. If a replica kept per-process memory, another replica would not have it. State must live outside the process (a store), keyed by something the request carries (a session id).
Why read config from the environment instead of hardcoding it? So the same image runs unchanged in dev, staging, and prod (only env vars differ), and so secrets never live in the code or repo. You can also validate config at startup and fail fast on a bad value.
Name two things the production Dockerfile does that a quick-and-dirty one skips. Runs as a non-root user, pins dependencies, installs deps in a cached layer before copying code, and adds a HEALTHCHECK. Any two. Each prevents a common production problem.