Lab M29: serve an agent like production
You'll need: your venv and fastapi plus uvicorn (from M11). The core lab needs no API key and
costs nothing (TestClient, in-process). Docker is optional for the last step. Time: about 45
minutes. Work in your breakout pair.
Heads up: the agent here is a stub on purpose; this module is about the SERVING shell around it, config, probes, lifecycle, statelessness, and the container. Nothing here can harm your computer.
This lab has two parts: - Part A: config from the environment, and the two health probes. - Part B: readiness gating, statelessness, and the production Dockerfile.
flowchart LR
LB["load balancer"] --> R1["replica 1"]
LB --> R2["replica 2"]
LB --> R3["replica 3"]
R1 -. "/readyz 503 until warm" .-> LB
STORE[("session store\n(state lives here)")] --- R1
STORE --- R2
STORE --- R3
Part A: config and probes
Step 1: Set up
Copy the solution/ files into a folder. Activate your venv.
pip install fastapi "uvicorn[standard]"
python -c "import fastapi; print('ready')"
ready.
Step 2: Run the demo
python demo.py
==== CONFIG comes from the environment ====
defaults: {... 'environment': 'development', 'max_steps': 6, 'api_key_set': False}
prod without key, validate(): ['ANTHROPIC_API_KEY is required in production']
custom env -> max_steps: 10 | log_level: DEBUG | key set: True (value never logged)
config.py and read validate and redacted.
Step 3: Compare the two probes
In the demo output:
GET /healthz: 200 {'status': 'alive'}
GET /readyz (warmed): 200
app.py and read healthz and readyz.
You should now see: /healthz answers "is the process alive?" (fail = restart me), /readyz
answers "should I get traffic yet?" (not-ready = hold traffic back). Two different questions, two
different consequences.
Step 4: Set config via the environment yourself
AGENT_MAX_STEPS=12 LOG_LEVEL=debug python -c "import config; print(config.load().redacted())"
max_steps is 12 and log_level is DEBUG, set entirely from the environment.
Same code, different config. This is how one image runs in dev, staging, and prod.
Part B: readiness, statelessness, and the container
Step 5: Watch readiness gate traffic
In the demo output:
GET /healthz while not ready: 200 (still alive)
GET /readyz while not ready: 503 (keep traffic away)
POST /chat while not ready: 503
/healthz 200) but /readyz and /chat return 503, so the load balancer keeps users off until it is
warm. This is what prevents requests hitting a half-started replica during a deploy.
Step 6: See that the service is stateless
In the demo output:
r1: {'answer': 'You said: first', 'session_id': 'a'}
r2: {'answer': 'You said: second', 'session_id': 'b'}
independent (no memory bleed): True
session_id (the M21 store). Read the note in
app.py on ChatIn.session_id.
Step 7: Read the production Dockerfile
Open Dockerfile. Find: a slim base, deps installed before code (layer
caching), a non-root appuser, EXPOSE 8000, a HEALTHCHECK that hits /healthz, and pinned
requirements.txt.
You should now see: each line is a habit that prevents a real incident (root containers, unpinned
deps, no healthcheck). If you have Docker: docker build -t agent-service . then
docker run -p 8000:8000 -e ANTHROPIC_API_KEY=sk-... agent-service. Without Docker, this step is
read-only.
Step 8: Run it for real (optional)
uvicorn app:app --host 0.0.0.0 --port 8000 --workers 2
curl -s localhost:8000/healthz ; echo
curl -s localhost:8000/readyz ; echo
curl -s -X POST localhost:8000/chat -H 'Content-Type: application/json' -d '{"message":"hello"}'
alive, ready, and a chat response, served by 2 worker processes (concurrency).
Ctrl-C to stop.
Step 9: Show it
Post your /healthz and /readyz responses (warmed and not-ready), and one sentence explaining the
difference between liveness and readiness in your own words.
If you get stuck
ModuleNotFoundError: fastapi->pip install fastapi "uvicorn[standard]"(from M11)./readyzis always 200 -> readiness flips to false only during startup/drain or when you setSTATE["ready"]=False; the demo simulates it.- Where does the API key go? -> the environment (
.envlocally, the platform's secret store in prod), never the code.config.pyreads it from there. - Docker step fails -> it is optional; the rest of the lab needs no Docker. Check Docker is installed and running.