Lab: M17: train a language model from scratch
You'll need: your venv. Part A: pip install numpy. Part B (optional): pip install torch
(use Python 3.10-3.12). No API key anywhere. Time: ~45 min • Work in your breakout pair.
Heads up: the generated text will look like gibberish with a flavour, that's expected from a tiny model on tiny data. Watch the loss drop (that's learning) and read the comments; quality isn't the point, understanding is. Maths-shy? Just watch what happens. Nothing here can harm your computer.
This lab has two parts: - Part A: train a tiny model from scratch (numpy) and generate text. - Part B: read & run a real transformer in miniature (PyTorch).
flowchart LR
Pred["predict next token"] --> Loss["measure error (loss)"] --> Nudge["nudge weights"] --> Pred
Pred -->|trained| Gen["generate text"]
Part A: train a model from scratch
Step 1: Set up
pip install numpy
tiny_lm.py (from solution/) and train_your_text.py (from
starters/) in a folder. Activate your venv.
You should now see: Successfully installed numpy-… and the files in your folder.
Step 2: Train it and watch it learn
python tiny_lm.py
You should now see: loss before training ≈ 2.2, then loss after training ≈ 0.5, then a
generated string like 'held. worlorlo worlohellld...'. The loss dropping is the model learning to
predict the next character; the generated text echoes the training data's patterns. You trained a
language model from nothing.
Step 3: Read the loop
Open tiny_lm.py. Find the four steps: predict (softmax(W[x_ids])), error (cross-entropy),
nudge the weights (the gradient + W -= lr*grad), repeated in the loop; then generate (sample
the next char, append, repeat).
You should now see / say: "predict → measure error → nudge weights → repeat, then sample to generate." That loop is how every LLM is trained, this one just has 81 weights instead of billions.
Step 4: Train it on YOUR text
Open train_your_text.py, set TEXT to your own sentences (longer + more repetitive = clearer
patterns), and run it.
You should now see: a final loss and a generated snippet that echoes your text's patterns. The model learned from the data you gave it, change the data, change what it learns.
Part B: a real transformer, in miniature
Step 5: (Optional) run the miniature GPT
pip install torch # Python 3.10-3.12
python nanogpt_mini.py
You should now see: the loss printed each ~100 steps going down, then a generated chunk of text. This one has real self-attention and transformer blocks: the actual LLM architecture, shrunk. (No torch / on 3.13+? Skip running it, reading it (Step 6) is the real goal.)
Step 6: Find the transformer's pieces
Open nanogpt_mini.py. Locate: token + position embeddings (self.tok, self.pos),
self-attention with the causal mask (in Block), the MLP + residuals + layer-norm,
and the head that predicts the next token.
You should now see / say: the pieces of a real transformer, embeddings → attention → blocks → predict, and that the training loop is identical to Part A (predict → loss → nudge → repeat).
Step 7: Scale it up in your head
Compare: this model has thousands of weights and trained on one sentence. A frontier LLM has billions of weights and trained on much of the internet for GPU-months, then fine-tuning + RLHF (M15).
You should now see / say: "same loop + same architecture + vastly more scale, data, and compute = an LLM." The black box is now a glass box.
Stuck? Working code is in
../solution/(tiny_lm.py+nanogpt_mini.py).
Your win
You trained a language model from scratch (loss dropped, it generated text), and you can name the transformer's pieces, so LLMs aren't magic anymore.
Post it to the chat wins board: "Trained a model from scratch, loss 2.2 → 0.5, and it babbles in the style of my text. I see what's under the hood now "
Take-home (optional)
Watch the first hour of Andrej Karpathy's "Neural Networks: Zero to Hero" (or skim nanoGPT), he builds from exactly this kind of tiny model up to a real GPT, line by line. You'll recognize every piece. Then come back to building with models (the other 16 modules), that's the engineer's job.