Lab M35: five go-deeper operations tools
You'll need: Python and your venv. No API key, no cost, instant and deterministic. Time: about 45 minutes (≈10 min per lab; do them in any order). Work in your breakout pair.
Optional / go-deeper, best after M31–M34. Each lab is a small, self-contained script that extends a module you already built. Nothing here is simulated wrongly: the operations (querying logs, computing signals, sampling, rate-limiting, curating) are the real ones.
flowchart LR
OBS["OBSERVE"] --> L1["1 · structured logs (M20)"]
OBS --> L2["2 · dashboard & golden signals (M20/M31)"]
RESP["RESPOND"] --> L3["3 · online eval / drift (M26/M30)"]
DEP["DEPLOY"] --> L4["4 · rate limits & quotas (M25/M29)"]
IMP["IMPROVE"] --> L5["5 · the flywheel (M30/M31)"]
Step 0: Set up
Copy the solution/ files into a folder and activate your venv. Nothing to install.
python -c "import structured_logging, dashboard, online_eval, rate_limit, improvement; print('go-deeper ok')"
go-deeper ok. Run everything at once with python demo.py, or one at a time below.
Lab 1 — Structured logging & correlation (extends M20)
python structured_logging.py
req-B:
==== C. CORRELATE: the whole story of the failing request (req-B) ====
{... 'event': 'request' ...}
{... 'event': 'tool', 'tool': 'retrieve', 'latency_ms': 5000, 'error': True}
{... 'event': 'response', 'status': 503}
request_id, you reconstruct one request's path in a single
correlate() call, and query(event='tool', error=True) finds failures across all requests.
Lab 2 — Dashboard & the four golden signals (extends M20 + M31)
python dashboard.py
BREACH:
latency p95 5000ms BREACH
error rate 10.0% BREACH
saturation 100% BREACH
SLO burn 10.0x BREACH
dashboard.py and read
THRESHOLDS, they decide what counts as "too far." (You will tame this with a composite alert in the challenge.)
Lab 3 — Online evaluation (extends M26 + M30)
python online_eval.py
==== B. ONLINE EVAL over the full stream (quality drifted in the 2nd half) ====
{'sampled': 10, 'of': 50, 'avg_score': 0.6, 'drift_detected': True}
Lab 4 — Capacity, rate limits & quotas (extends M25 + M29)
python rate_limit.py
8 requests at t=0: ['allow','allow','allow','allow','allow','429','429','429']
acme: ['allow', 'allow', 'allow', 'over-quota']
acquire x3: [True, True, False]
Lab 5 — Continuous improvement, the flywheel (extends M30 + M31)
python improvement.py
week incidents new repeats_prevented total_guards
1 3 3 0 3
4 1 0 1 4
Step 6: Show it
Post in the chat one result: the correlated failing request (Lab 1), a dashboard breach (Lab 2), the drift online eval caught (Lab 3), or the falling incident count (Lab 5).
If you get stuck
ModuleNotFoundError-> run from inside the folder with the.pyfiles (or runpython demo.py).- My numbers differ -> every lab is deterministic; if you edited the synthetic inputs, the outputs
change with them. Re-read the
_simulate/_window/_stream/_weekshelper at the bottom of each script.