M15: Fine-tuning & training
Prompting and RAG get you a long way, but sometimes you want a model that just knows your style, format, or task without a giant prompt every time. That's fine-tuning: you take a pre-trained model and train it a little further on your own examples. Today you learn how models are trained, build a real fine-tuning dataset with your own hands, and run a fine-tuning job, and learn the most important skill of all: knowing when fine-tuning is the right tool.
Today's win: a validated fine-tuning dataset you built, a fine-tuning job you know how to run, and a clear rule for when to fine-tune vs prompt vs RAG.
Today you will
- Understand how models are trained (neural networks → transformers → pre-training → fine-tuning → RLHF), enough to use it, no maths
- Build & validate a fine-tuning dataset (chat-format JSONL), the part that actually decides quality
- Submit a fine-tuning job (hosted API) and use the result; know the local LoRA path for open models
- Decide when to fine-tune (and when prompting/RAG is better)
Run of show (~60 min)
| Time | What we do |
|---|---|
| 0:00 | Hook + the win we're chasing |
| 0:05 | How training works, and what fine-tuning is for (full read in notes.md) |
| 0:15 | Lab Part A: build & validate a dataset (no key, no GPU) |
| 0:40 | Lab Part B: submit a fine-tune job + the when-to-fine-tune decision |
| 0:55 | Show: post your dataset + your "fine-tune vs not" call |
| 1:00 | Wrap |
If you get stuck
- Dataset prep needs no key and no GPU: that's the part everyone does today. Submitting a real fine-tune needs a provider account (this module uses OpenAI's fine-tuning API, a different key than Anthropic) or a GPU for the local LoRA path.
- Fine-tuning a model that seems wrong is almost always a dataset problem (too few examples, inconsistent style), not a code problem. Re-read the You should now see line.
- This is educational: we fine-tune on our own sample data for style/format, responsibly.
Optional challenge
Take the same task and do it three ways, a good prompt (M5), RAG (M7), and a fine-tune, then argue which you'd ship and why (quality, cost, effort, how often the data changes). That judgment is the real lesson of this module.