Skip to content

Lab: M14: test for bias, protect privacy

You'll need: your M4 setup (venv, anthropic) for Part A; Part B is pure Python (no key). Time: ~45 minutes • Work in your breakout pair.

Heads up: this is educational and authorized: we probe our own app to make it fairer and safer; we don't build anything harmful. Bias findings are sensitive, discuss them thoughtfully. Nothing here can harm your computer.

This lab has two parts: - Part A: probe a model for unfair treatment (needs your key). - Part B: redact personal data before it's sent (no key).

flowchart LR
  Same["same task"] --> Swap["swap a sensitive attribute"] --> Cmp["compare outputs"] --> Judge["human judges: bias?"]
  PII["user text"] --> Red["redact PII"] --> Safe["safe to send"]

Part A: fairness probe

Step 1: Set up

Put bias_probe.py, privacy.py (from solution/) and responsible_starter.py (from starters/) in a folder with your M4 .env. Activate your venv.

You should now see: (.venv) and those files.

Step 2: Run the bias probe

python bias_probe.py
It asks the model for a suggested salary for the same role, changing only the candidate's name.

You should now see: for each pair, two suggested salaries and either ~ same suggestion (good) or DIFFERENT by $…, investigate. If a number moves when only the name changed, you've found something worth investigating, that's a fairness signal you could never spot by trying it once.

Step 3: Judge it (the human part)

With your partner, look at any flagged pair. Is the difference a real bias (a stereotype), or just noise? Re-run once or twice, does the gap persist?

You should now see / say: "a difference isn't automatic proof, I decide if it reflects a stereotype, and whether it's consistent." That human judgment is the responsible part; the tool just surfaces it.

Step 4: Read how the probe works

Open bias_probe.py. Note the PROBES pairs differ only by a sensitive attribute, and the task is numeric so differences are measurable.

You should now see / say: "same task, swap a should-not-matter attribute, compare." It's M8's eval mindset and M10's red-team mindset, pointed at fairness.


Part B: protect privacy (no key needed)

Step 5: Run the redactor

python privacy.py

You should now see: a sample message with the email, phone, SSN, and card number replaced by [… REDACTED], and a count of what was removed. This runs entirely on your machine, perfect for cleaning text before you send it to a hosted model.

Step 6: Add your own pattern (finish the starter)

Open responsible_starter.py. Add a PII pattern to EXTRA_PATTERNS (TODO 1), e.g. a postcode/zip, and test it on text containing one.

You should now see: your new pattern getting redacted too. (Regex is a first line, names and odd formats slip through, so also collect less data in the first place.)

Step 7: Wire it into a real app

Picture your M7 RAG or M9 agent: where would you call redact_pii so user input is cleaned before the model (or a tool) ever sees it? Write the one line.

You should now see / say: redact at the boundary: as soon as user input arrives, before it reaches the model, logs, or tools. Privacy by design.

Step 8: The responsible-AI checklist

Skim the duties table in notes.md. For your capstone idea, name one thing you'll do for fairness, privacy, and human oversight.

You should now see: three concrete commitments for your own app. That's responsible AI, habits, not an afterthought.

Stuck? Working examples are in ../solution/.


Your win

You can test an AI app for unfair treatment, strip personal data before it's sent, and name the responsible-AI duties you own as the engineer.

Post it to the chat wins board: "Same job, just a different name → a $7k salary gap my app suggested. Found it, flagged it. And my redactor scrubs PII before anything's sent "

Take-home (optional)

Add a fairness probe to your capstone's eval set (M8) and redact_pii at its input boundary. Combined with M10's guardrails, that's a genuinely responsible app, exactly what the capstone's "how would you secure it?" question is really asking.