M29 solution: agent deployment and serving
A production-shaped FastAPI service for an agent: config from the environment, liveness and readiness probes, graceful lifecycle, request logging, statelessness, and a proper container. The agent is a stub so the focus stays on serving; the whole thing is verified offline with TestClient.
Files
| File | Role |
|---|---|
config.py |
12-factor Settings read from the environment, with validate() (fail fast) and redacted() (log config without leaking the secret). |
app.py |
The service: /healthz (liveness), /readyz (readiness, 503 until warm), /chat (stateless), a request-id access log, and a lifespan that validates config, warms up, and drains on shutdown. |
Dockerfile |
A production container: slim base, layer-cached deps, non-root user, EXPOSE, HEALTHCHECK, multi-worker uvicorn. |
requirements.txt |
Pinned dependencies for the container image. |
demo.py |
Exercises config, both probes, readiness gating, statelessness, and request ids via TestClient. Start here. |
../starters/add_serving_feature.py |
Add a metrics endpoint, rate limit, async concurrency, or per-session memory. |
Run it
pip install fastapi "uvicorn[standard]"
python demo.py # offline: config, probes, readiness gating, statelessness
uvicorn app:app --host 0.0.0.0 --port 8000 --workers 2 # serve it
curl -s localhost:8000/readyz ; echo
docker build -t agent-service . # optional: containerize
docker run -p 8000:8000 -e ANTHROPIC_API_KEY=sk-... -e ENVIRONMENT=production agent-service
The serving checklist this demonstrates
- Config from the env (12-factor): model, caps, timeouts, log level, port, and the secret key all
come from environment variables;
validate()fails fast on bad config;redacted()never logs the key. - Liveness vs readiness:
/healthz= "process alive, restart if not";/readyz= "ready for traffic, hold back if not". Readiness returns 503 during startup and draining. - Graceful lifecycle: the lifespan validates and warms on startup, flips readiness off and drains on shutdown, so deploys and scale-downs do not drop in-flight requests.
- Statelessness: no per-process session memory; state lives outside, keyed by
session_id, so many replicas can run behind a load balancer (the M21 caution, applied). - Container hygiene: slim image, non-root user, healthcheck, pinned deps.
Verified (offline)
- Config: env overrides apply;
validate()flags a missing key in production and a badmax_steps;redacted()exposes onlyapi_key_set, never the value. - Service:
/healthz200 always;/readyz200 when warm and 503 when not;/chatreturns 503 until ready, then a stateless, independent answer per request; each response carries a uniquex-request-id. - Fail fast: bad config makes startup raise (the lifespan refuses to serve).
Dockerfilecontains a slim base, non-root user,EXPOSE,HEALTHCHECK, and uvicorn.- All
.pyfiles compile. No key needed; production wiring reads the key from the environment.