Skip to content

M5: Prompt engineering

In M4 you got a model talking. But "make this nicer" gets you mush, while a well-built prompt gets you exactly what you pictured. Today you learn to steer: the same model, the same code, wildly different results, just from the words you choose. You'll A/B two prompts on the same input and watch the quality jump.

Today's win: you build a prompt-powered tool from your own life or work, and you can reliably make the model do what you want, proven by A/B-ing a vague prompt against an engineered one.

Today you will

  • Use system vs user prompts, few-shot examples, chain-of-thought, and structured (JSON) output
  • A/B test two prompts on the same input and see the difference with your own eyes
  • Know when prompting alone is enough: and when you'll need more (a preview of RAG and agents)

Run of show (~60 min)

Time What we do
0:00 Hook + the win we're chasing
0:05 The one idea: the prompt is the program you write for the model (full read in notes.md)
0:10 Lab Part A: A/B vague vs engineered; add few-shot
0:35 Lab Part B: structured JSON output + a chain-of-thought try; build your own tool
0:55 Show: post your A vs B difference to the wins board
1:00 Wrap + take-home

If you get stuck

  • No setup today, you reuse M4's key, .env, and libraries. If a call fails, it's almost always the same .env/key fix from M4.
  • Prompts are experiments, there's no single right answer, and "worse" outputs are useful data. Re-read the You should now see line and compare with your partner.
  • If a JSON reply won't parse, run it again (prompt-only JSON isn't guaranteed, M6 fixes that). Nothing here can harm your computer.

Optional challenge

Find a prompt that breaks your tool (vague input, a trick request, another language) and then add one sentence to your system prompt that fixes it. You just did real prompt engineering, and previewed M10's guardrails.