Skip to content

M33: Data and release operations (Part E: Operations Support)

Two things you built earlier quietly rot if nobody operates them. The knowledge base from M7 is not build-once: source documents change, so the index goes stale and starts answering from yesterday's truth; it needs backups, and it holds data that should expire. And every new release of your agent is a risk: it passed CI (M26), but "probably fine" is not "safe for everyone." Today you safeguard both, the databases and the builds. You keep the index fresh and recoverable, and you ship new versions behind a canary that promotes a good build and rolls back a bad one before a single user sees it, all offline.

Today's win: an index that redacts PII on write, re-embeds only the docs that changed, sweeps expired data, and restores from a backup; plus a release manager that canaries a candidate against the live version and promotes or rolls back automatically, and a secret that rotates with zero downtime.

Today you will

  • Redact PII on write and keep the RAG index from leaking data it never should have stored (M14/M30)
  • Detect a stale document and re-embed only what changed, not the whole store
  • Apply retention (TTL) and prove a backup restores the index
  • Canary a release against the live baseline on an eval set, then promote or roll back (M26)
  • Rotate a secret with a grace window so a rotation never causes an outage

Run of show (about 60 minutes)

Time What we do
0:00 Hook: the index and the deploys both rot without operations
0:05 The one idea: operate the data and gate the builds (read notes.md)
0:12 Lab Part A: index ops, staleness, retention, backup/restore
0:32 Lab Part B: canary, promote/rollback, and secret rotation
0:52 Show: post your stale-doc reindex and your rolled-back bad release
1:00 Wrap

If you get stuck

  • Safeguards the databases (the M7 vector store) and the builds (the M11/M26/M29 deploy). The canary reuses the eval idea from M20/M26; PII redaction is the privacy rule from M14/M30.
  • The whole lab runs offline, free, no key, and instantly. The Index is an in-memory stand-in for a vector store and embeddings are stubbed as a hash, so the operations are real but nothing calls a model.
  • Data ops live in index_ops.py; release ops in release_ops.py. Read the one for the step you are on.

Optional challenge

Open starters/canary_ramp.py and implement a progressive canary: instead of an all-or-nothing promote, ramp the candidate's traffic share 1% → 10% → 50% → 100%, checking the error rate at each stage and rolling back the moment any stage breaches the budget. It is how real systems limit the blast radius of a bad deploy to a tiny slice of users.