M15: Fine-tuning & training

Prompting and RAG get you a long way, but sometimes you want a model that just knows your style, format, or task without a giant prompt every time. That's fine-tuning: you take a pre-trained model and train it a little further on your own examples. Today you learn how models are trained, build a real fine-tuning dataset with your own hands, and run a fine-tuning job, and learn the most important skill of all: knowing when fine-tuning is the right tool.

Today's win: a validated fine-tuning dataset you built, a fine-tuning job you know how to run, and a clear rule for when to fine-tune vs prompt vs RAG.

Today you will

Understand how models are trained (neural networks → transformers → pre-training → fine-tuning → RLHF), enough to use it, no maths
Build & validate a fine-tuning dataset (chat-format JSONL), the part that actually decides quality
Submit a fine-tuning job (hosted API) and use the result; know the local LoRA path for open models
Decide when to fine-tune (and when prompting/RAG is better)

Run of show (~60 min)

Time	What we do
0:00	Hook + the win we're chasing
0:05	How training works, and what fine-tuning is for (full read in `notes.md`)
0:15	Lab Part A: build & validate a dataset (no key, no GPU)
0:40	Lab Part B: submit a fine-tune job + the when-to-fine-tune decision
0:55	Show: post your dataset + your "fine-tune vs not" call
1:00	Wrap

If you get stuck

Dataset prep needs no key and no GPU: that's the part everyone does today. Submitting a real fine-tune needs a provider account (this module uses OpenAI's fine-tuning API, a different key than Anthropic) or a GPU for the local LoRA path.
Fine-tuning a model that seems wrong is almost always a dataset problem (too few examples, inconsistent style), not a code problem. Re-read the You should now see line.
This is educational: we fine-tune on our own sample data for style/format, responsibly.

Optional challenge

Take the same task and do it three ways, a good prompt (M5), RAG (M7), and a fine-tune, then argue which you'd ship and why (quality, cost, effort, how often the data changes). That judgment is the real lesson of this module.