M17 solution
The expected, fully-commented artifacts for M17's lab. Peek only after you've tried.
| File | What it is |
|---|---|
tiny_lm.py |
A language model trained from scratch in ~50 lines of numpy: char tokenizer, a weight matrix, gradient-descent training (loss drops), and text generation. No GPU, no key. |
nanogpt_mini.py |
A real transformer in miniature (PyTorch): token + position embeddings, causal self-attention, blocks (MLP + residual + layer-norm), training loop, generation. The actual LLM architecture, shrunk. |
Run it
pip install numpy
python tiny_lm.py # trains in ~1s; watch loss drop, see it generate
pip install torch # Python 3.10-3.12 (optional, for the real transformer)
python nanogpt_mini.py # trains a tiny GPT in ~a minute on CPU
How this was verified
tiny_lm.pytrained and run for real (numpy 2.4.x, Python 3): loss dropped 2.197 → 0.507 and it generated text echoing the training data's patterns. The starter (train_your_text.py) likewise trains and generates.nanogpt_mini.pyis syntax-verified (compiles) and follows standard transformer structure (token/position embeddings, causal-masked multi-head attention, MLP blocks with residuals + layer-norm, cross-entropy training, sampling generation). torch has no wheel for the sandbox's Python 3.14, so training it must be piloted on Python 3.10-3.12 (runs on CPU in ~a minute).
No API key or network is used anywhere in M17, it's pure local training. This is the course's one "how models are built" deep-dive; the engineer's job is the other 16 modules (building with models).