Skip to content

M6: Driving the model from code (the API, properly)

You can call a model and you can prompt it. Today you take the wheel: the knobs that shape every reply (length, randomness), streaming so words appear as they're written, and the big one, getting back guaranteed structured JSON your code can use, not just prose to read. By the end the API isn't a mystery box; it's a tool you drive.

Today's win: an app that turns messy free-text into clean, guaranteed-valid JSON your code can use, plus fluency with max_tokens, temperature, and streaming.

Today you will

  • Control replies with max_tokens and temperature (and learn why the newest Opus models drop the temperature knob)
  • Stream a reply so it appears word-by-word
  • Get structured JSON output the API guarantees: and parse it into a Python dict you use

Run of show (~60 min)

Time What we do
0:00 Hook + the win we're chasing
0:05 The one idea: a request is data you control; a response is data you parse (full read in notes.md)
0:10 Lab Part A: max_tokens, temperature, streaming
0:35 Lab Part B: build the structured-JSON extractor
0:55 Show: post your extracted record to the wins board
1:00 Wrap + take-home

If you get stuck

  • No new setup, reuse M4's key, .env, and libraries. Same .env/key fixes apply.
  • Two model facts that trip people: temperature errors on claude-opus-4-8 (use claude-haiku-4-5 for the temperature demo, the lab does), and structured JSON output needs a current model (we use ones that support it). Nothing here can harm your computer.
  • Re-read the You should now see line and compare with your partner.

Optional challenge

Change the extractor's schema to pull fields you care about (e.g. add a priority enum to a task extractor, or a due_date). Feed it three messy notes and confirm every reply parses on the first try, that reliability is the whole point of structured output.