M30: Agent data and feedback loops (Part D: Agentic Systems)

A deployed agent is not the end; it is a data source. Every real question it answers, and every thumbs up, thumbs down, or correction a user gives, is the most valuable signal you have for making it better, more valuable than any dataset you could buy. Today you build the flywheel: capture production interactions and feedback, curate them into new eval cases (M26) and fine-tuning examples (M15), and feed them back in. Usage compounds into quality, and you do it without leaking anyone's private data.

Today's win: a feedback pipeline that turns real interactions into a regression eval case AND a corrected training example, with PII redacted on the way in, all demonstrated offline.

Today you will

Log production interactions (question, answer, sources, feedback) with PII redacted at write time
Read the three core signals: thumbs up, thumbs down + correction, and down with no fix
Curate logs into eval cases (M26): up becomes golden, down+correction becomes a regression
Curate logs into fine-tuning examples (M15): learn from good answers and corrected ones
See the flywheel: usage becomes the data that improves the agent, while respecting privacy (M14)

Run of show (about 50 minutes)

Time	What we do
0:00	Hook: your agent is a data source
0:05	The one idea: capture, curate, feed back (read `notes.md`)
0:12	Lab Part A: log interactions with feedback, and redact PII
0:28	Lab Part B: curate into eval cases and fine-tuning examples
0:45	Show: post a down-vote that became a regression test and a training fix
0:50	Wrap

If you get stuck

Connects M20/M26 (evals) with M15 (fine-tuning) and M14 (privacy). The whole lab runs offline, free, no key (logging and curation are plain Python over JSONL).
No new libraries. Nothing here can harm your computer; all the data is synthetic.
The mental model: feedback is labeled data you already paid to generate. Curation is the judgement step that turns it into evals and training data.

Optional challenge

Open starters/add_signal.py and add an implicit feedback signal: treat a user's edit of the answer as a correction, or "try again" as a weak negative. Implicit signals are far more plentiful than thumbs, but noisier, so curate them carefully (and redact PII first).