Lab: M5: prompt engineering

You'll need: your M4 project setup, venv active ((.venv)), your key in .env, anthropic + python-dotenv installed. No new setup. Time: ~45 min • Work in your breakout pair.

Heads up: prompts are experiments. There's no single right answer, and a "bad" output is useful information, not a failure. Run things more than once, the model varies. Nothing here can harm your computer or cost more than a fraction of a cent per call.

This lab has two parts: - Part A: A/B a vague prompt against an engineered one, then add a few-shot example. - Part B: get structured JSON output, try chain-of-thought, and build your own tool.

flowchart LR
  In["one input"] --> A["vague prompt"] --> Oa["generic output"]
  In --> B["engineered prompt<br/>role · rules · few-shot · output shape"] --> Ob["exactly what you wanted"]

Part A: steer the model

Step 1: Set up the folder

Put ab_compare.py, rewrite.py (from solution/), and prompt_lab_starter.py (from starters/) in a folder with your .env from M4. Activate your venv.

You should now see: (.venv) in your prompt, and the three files plus .env in the folder.

Step 2: A/B: vague vs engineered (the big reveal)

python ab_compare.py

You should now see: the same blunt message rewritten twice. Output A (from "Make this message nicer") is generic or off-target; output B (from a prompt with a role, clear rules, and a defined output) is noticeably warmer, tighter, and on-brief. Same model, same input, only the prompt changed. That gap is what this whole module is about.

Step 3: Read why B wins

Open ab_compare.py. Compare NAIVE_SYSTEM and ENGINEERED_SYSTEM. With your partner, name the three things B adds.

You should now see / say: B gives the model a role ("a thoughtful colleague"), explicit rules (keep facts, 2-4 sentences, no blame), and a defined output ("only the rewritten message"). Those three moves, role, rules, output, are most of prompt engineering.

Step 4: Few-shot: show, don't just tell

Open rewrite.py and find EXAMPLES, one worked example (a blunt input and the ideal answer). This is few-shot prompting: the model copies the pattern. Run it and paste a blunt message:

python rewrite.py

You should now see: a rewrite that matches the example's shape and tone closely and reliably. Giving even one example pins the format far better than describing it in words.

Part B: structure, reasoning, and your own tool

Step 5: Structured output you can use in code

Notice rewrite.py asked for JSON and parsed it, so the result is a Python dict with subject, body, and tone_note, not just a blob of text. Look at its printout.

You should now see: the reply split into a Subject, a body, and a "what changed" note, separate fields your code could email, save, or display. Asking for JSON turns prose into data. (Prompt-only JSON usually works but isn't guaranteed; if it fails to parse, run again, M6 shows the API feature that guarantees it.)

Step 6: Chain-of-thought: ask it to think first

Open prompt_lab_starter.py. Set the input to a reasoning question and A/B "answer" vs "reason":

INPUT_TEXT = "A shop had 3 boxes of 12 apples. They sold 20 apples and threw out 5 bruised ones. How many are left? "
PROMPT_A = "Answer with just the final number."
PROMPT_B = "Think step by step, show your working, then give the final number."

Run it: python prompt_lab_starter.py

You should now see: B lays out the steps (36 − 20 − 5 = 11) and lands it; A just asserts a number. For anything involving steps or logic, asking the model to think first ("chain-of-thought") makes it more accurate.

Step 7: Build YOUR tool

Back in prompt_lab_starter.py, set INPUT_TEXT to a real task from your life or work (rewrite an email, classify some feedback, generate study questions from notes). Engineer PROMPT_B with a role + rules + output shape (add a few-shot example if you like). Re-run and tune until B clearly beats A.

You should now see: your own prompt-powered tool producing output you'd actually use, and a visible quality gap between your first vague attempt and your engineered one.

Stuck? Working examples are in ../solution/. Peek only after you've tried.

Your win

You can reliably steer a model: role, rules, few-shot, chain-of-thought, and structured output, and you proved it by A/B-ing your own tool.

Post it to the chat wins board: your A vs B, e.g. "A: 'Sure, here's a nicer version...' → B: a crisp 3-line email I'd actually send. Same model, better prompt "

Take-home (optional)

Take a prompt you use in a chatbot app (ChatGPT/Claude) and rewrite it with role + rules + one example. Notice how much more consistent the results get. Prompting is a skill you'll use every day from here on.