Skip to content

M10 solution

The expected, fully-commented artifacts for M10's lab. Peek only after you've tried.

File What it is
guardrails.py The three guards (plain Python): screen_input (injection/jailbreak), screen_output (secret leak), screen_tool (least-privilege allow-list for excessive agency).
redteam.py A small support bot with a hidden secret + a red-team set + a scorecard comparing guardrails OFF vs ON (leaks, false-blocks).

Run it

With your venv active and your M4 .env present:

python redteam.py

How this was verified

Verified on Python 3: - Guardrails verified for real (pure Python, no key): the input guard blocks the injection / prompt-leak / jailbreak attacks and allows the benign control; the output guard blocks the secret and passes clean text; the tool guard allow-lists safe tools and refuses dangerous (send_email) and unknown tools. - Red-team harness verified with a mocked vulnerable model: with guardrails OFF the secret leaks (3 leaks), with guardrails ON leaks drop to 0 and the benign control is still answered (0 false-blocks).

All data is synthetic and red-teaming here targets your own practice app only. The only unverified step is the live model call (the learner's key), and note that a strong model may resist some attacks even with guardrails OFF; that's a bonus, not a substitute for the guardrails. No API key or billed call was used here.