M10 solution
The expected, fully-commented artifacts for M10's lab. Peek only after you've tried.
| File | What it is |
|---|---|
guardrails.py |
The three guards (plain Python): screen_input (injection/jailbreak), screen_output (secret leak), screen_tool (least-privilege allow-list for excessive agency). |
redteam.py |
A small support bot with a hidden secret + a red-team set + a scorecard comparing guardrails OFF vs ON (leaks, false-blocks). |
Run it
With your venv active and your M4 .env present:
python redteam.py
How this was verified
Verified on Python 3:
- Guardrails verified for real (pure Python, no key): the input guard blocks the injection /
prompt-leak / jailbreak attacks and allows the benign control; the output guard blocks the secret and
passes clean text; the tool guard allow-lists safe tools and refuses dangerous (send_email) and
unknown tools.
- Red-team harness verified with a mocked vulnerable model: with guardrails OFF the secret
leaks (3 leaks), with guardrails ON leaks drop to 0 and the benign control is still answered
(0 false-blocks).
All data is synthetic and red-teaming here targets your own practice app only. The only unverified step is the live model call (the learner's key), and note that a strong model may resist some attacks even with guardrails OFF; that's a bonus, not a substitute for the guardrails. No API key or billed call was used here.