Skip to content

M17 solution

The expected, fully-commented artifacts for M17's lab. Peek only after you've tried.

File What it is
tiny_lm.py A language model trained from scratch in ~50 lines of numpy: char tokenizer, a weight matrix, gradient-descent training (loss drops), and text generation. No GPU, no key.
nanogpt_mini.py A real transformer in miniature (PyTorch): token + position embeddings, causal self-attention, blocks (MLP + residual + layer-norm), training loop, generation. The actual LLM architecture, shrunk.

Run it

pip install numpy
python tiny_lm.py            # trains in ~1s; watch loss drop, see it generate

pip install torch           # Python 3.10-3.12 (optional, for the real transformer)
python nanogpt_mini.py      # trains a tiny GPT in ~a minute on CPU

How this was verified

  • tiny_lm.py trained and run for real (numpy 2.4.x, Python 3): loss dropped 2.197 → 0.507 and it generated text echoing the training data's patterns. The starter (train_your_text.py) likewise trains and generates.
  • nanogpt_mini.py is syntax-verified (compiles) and follows standard transformer structure (token/position embeddings, causal-masked multi-head attention, MLP blocks with residuals + layer-norm, cross-entropy training, sampling generation). torch has no wheel for the sandbox's Python 3.14, so training it must be piloted on Python 3.10-3.12 (runs on CPU in ~a minute).

No API key or network is used anywhere in M17, it's pure local training. This is the course's one "how models are built" deep-dive; the engineer's job is the other 16 modules (building with models).