Skip to content

M29 solution: agent deployment and serving

A production-shaped FastAPI service for an agent: config from the environment, liveness and readiness probes, graceful lifecycle, request logging, statelessness, and a proper container. The agent is a stub so the focus stays on serving; the whole thing is verified offline with TestClient.

Files

File Role
config.py 12-factor Settings read from the environment, with validate() (fail fast) and redacted() (log config without leaking the secret).
app.py The service: /healthz (liveness), /readyz (readiness, 503 until warm), /chat (stateless), a request-id access log, and a lifespan that validates config, warms up, and drains on shutdown.
Dockerfile A production container: slim base, layer-cached deps, non-root user, EXPOSE, HEALTHCHECK, multi-worker uvicorn.
requirements.txt Pinned dependencies for the container image.
demo.py Exercises config, both probes, readiness gating, statelessness, and request ids via TestClient. Start here.
../starters/add_serving_feature.py Add a metrics endpoint, rate limit, async concurrency, or per-session memory.

Run it

pip install fastapi "uvicorn[standard]"
python demo.py                          # offline: config, probes, readiness gating, statelessness

uvicorn app:app --host 0.0.0.0 --port 8000 --workers 2     # serve it
curl -s localhost:8000/readyz ; echo

docker build -t agent-service .         # optional: containerize
docker run -p 8000:8000 -e ANTHROPIC_API_KEY=sk-... -e ENVIRONMENT=production agent-service

The serving checklist this demonstrates

  • Config from the env (12-factor): model, caps, timeouts, log level, port, and the secret key all come from environment variables; validate() fails fast on bad config; redacted() never logs the key.
  • Liveness vs readiness: /healthz = "process alive, restart if not"; /readyz = "ready for traffic, hold back if not". Readiness returns 503 during startup and draining.
  • Graceful lifecycle: the lifespan validates and warms on startup, flips readiness off and drains on shutdown, so deploys and scale-downs do not drop in-flight requests.
  • Statelessness: no per-process session memory; state lives outside, keyed by session_id, so many replicas can run behind a load balancer (the M21 caution, applied).
  • Container hygiene: slim image, non-root user, healthcheck, pinned deps.

Verified (offline)

  • Config: env overrides apply; validate() flags a missing key in production and a bad max_steps; redacted() exposes only api_key_set, never the value.
  • Service: /healthz 200 always; /readyz 200 when warm and 503 when not; /chat returns 503 until ready, then a stateless, independent answer per request; each response carries a unique x-request-id.
  • Fail fast: bad config makes startup raise (the lifespan refuses to serve).
  • Dockerfile contains a slim base, non-root user, EXPOSE, HEALTHCHECK, and uvicorn.
  • All .py files compile. No key needed; production wiring reads the key from the environment.