Lab M22: make an agent survive the real world
You'll need: your venv and the anthropic plus python-dotenv from M4. The core lab needs no
API key, costs nothing, and runs instantly (we inject fake failures and stub out the backoff waits).
A live run at the end is optional. Time: about 45 minutes. Work in your breakout pair.
Heads up: we make the model fail on purpose: it errors, it hangs, it loops, it tries to do something risky, and you watch each reliability pattern handle it. The "risky" tool only pretends to send an email; nothing leaves your machine and nothing can be harmed.
This lab has two parts: - Part A: retry, timeout, and graceful degrade (surviving a flaky or down service). - Part B: step caps and the human-approval gate (surviving the agent itself).
flowchart TB
T["task"] --> CAP{"step cap<br/>tick()"}
CAP -->|ok| CALL["model call"]
CALL --> TO["timeout"]
TO --> RE["retry + backoff"]
RE -->|still failing| DEG["safe message<br/>(degrade)"]
RE -->|tool wanted| GATE{"risky?<br/>approval gate"}
GATE -->|safe / approved| RUN["run tool"]
GATE -->|denied| BLK["blocked"]
CAP -->|over cap| STOP["stop: runaway"]
Part A: surviving a flaky or down service
Step 1: Set up
Copy the solution/ files and starters/.env.example into
a folder. Activate your venv.
python -c "import anthropic, dotenv; print('deps ok')"
deps ok. (If not: pip install anthropic python-dotenv, the M4 libraries.)
Step 2: Run the fault-injection demo
python demo_mock.py
==== A. RETRY: model fails twice, then works ====
{'answer': 'The answer is 391.', 'steps': 2, 'blocked': [], 'degraded': False}
retry waited and tried again, so the agent
still got the right answer. A blip did not become a failure.
Step 3: See the timeout fire
Look at section E of the same output:
==== E. TIMEOUT: a slow call is given up on ====
caught: call timed out after 0.05s ...
retry would then retry).
Open reliability.py and read call_with_deadline and retry.
You should now see: retry only re-runs on transient errors and waits longer each time (backoff).
Step 4: See graceful degradation
Look at section C:
==== C. GRACEFUL DEGRADE: outage, all retries fail ====
{'answer': 'Service unavailable, please try again later. ...', 'degraded': True}
degraded. Failing safely is the goal when you cannot succeed.
Part B: surviving the agent itself
Step 5: Stop a runaway loop
Look at section B:
==== B. STEP CAP: model loops forever, cap stops it ====
{'answer': 'Stopped: exceeded 4 steps (possible runaway loop)', 'steps': 5, 'degraded': True}
StepLimiter stopped it at the cap. This is your hard cost-safety control (and
M20's observability is how you would have spotted the loop).
Step 6: Block a risky action behind human approval
Look at section D:
==== D. APPROVAL GATE: a risky action ====
default (deny): {... 'blocked': ['send_email'] ...}
human approves: {... 'blocked': [] ...}
send_email, a world-changing action. By default the agent BLOCKED it (no human
said yes). When an approver returns yes, the same action runs. Open agent.py:
multiply is in SAFE_TOOLS and runs freely; send_email is in RISKY_TOOLS and must pass
approval_gate. The agent proposes; a human decides (human-in-the-loop, M14).
Step 7: Tune a policy yourself
In a Python shell, prove the cap is yours to set:
python -c "import agent, demo_mock as d; print(agent.run('loop', client=d.looping_client(), max_steps=2, sleep=lambda x:None))"
Stopped: exceeded 2 steps .... You changed the safety limit and the agent
obeyed it. Try max_steps=8 to see it run longer before stopping.
Step 8 (optional, costs a few tokens): a real run
Put your key in .env (copy .env.example), then:
cp .env.example .env # then edit .env and paste your key
python agent.py
degraded: False.
The reliability wrappers are invisible when nothing fails, which is exactly the point. Steps 1 to 7 need
no key.
Step 9: Show it
Post in the chat one section from the demo where a fault was handled: the retry recovery (A), the loop stopped (B), the safe degrade (C), or the blocked risky action (D).
If you get stuck
ModuleNotFoundError: anthropic->pip install anthropic python-dotenv(M4 libraries).demo_mock.pycannot findagent/reliability-> run it from inside the folder with the solution.pyfiles.- The demo seems to pause -> it should not; backoff sleep is stubbed with
sleep=lambda x: None. If you callagent.runyourself, pass that too or it will really wait. ANTHROPIC_API_KEYerror in Step 8 -> your.envis not named exactly.env, or the key line is wrong. Seeapi-keys.md. Steps 1 to 7 need no key.