Install & config guide: Ollama (run open models locally)
Needed at M13. Ollama lets you download and run open models (Llama, Gemma, Qwen…) on your own computer: no API key, no internet once downloaded, no per-token bill, and your data never leaves your machine. It runs the model and exposes a small local web API at
http://localhost:11434that your code calls just like the hosted API in M4.
Official docs (verify they open): https://ollama.com · model list: https://ollama.com/library.
Hardware note: models run on your CPU (and GPU if you have one). Stick to small models on a laptop,
llama3.2(~1-2 GB),gemma2:2b, orqwen2.5:0.5b. Big models need lots of RAM and are slow on a laptop. A small model is perfect for learning.
Step 1: Install Ollama
- macOS / Windows: download the app from https://ollama.com/download, run the installer, and launch it (it runs quietly in the background and starts the local server).
- Linux:
curl -fsSL https://ollama.com/install.sh | sh
You should now see: Ollama installed. Check it in a terminal:
ollama --version
Step 2: Pull a small model
ollama pull llama3.2
success. (One-time download; it's cached
after that.) See what you have with ollama list.
Step 3: Chat with it in the terminal (no code yet)
ollama run llama3.2
/bye to exit.
You should now see: the model replying in your terminal, generated entirely on your machine, no internet needed after the download.
Step 4: Confirm the local API is up
Your code talks to Ollama over HTTP. Check the server responds:
curl http://localhost:11434/api/tags
localhost:11434) is
what chat_local.py posts to.
Step 5: Install the Python HTTP library
With your venv active:
pip install requests
Successfully installed requests-…. Now python chat_local.py can call
your local model.
Troubleshooting
| Symptom | Fix |
|---|---|
ollama: command not found |
Installer didn't finish, or open a new terminal. On macOS/Windows make sure the Ollama app is running. |
| Python error "couldn't reach localhost:11434" | Ollama isn't running. Start the app, or run ollama serve in a terminal; confirm with ollama list. |
| Model reply is very slow | The model is too big for your machine, pull a smaller one (gemma2:2b, qwen2.5:0.5b) and set MODEL to it. |
model "x" not found |
You haven't pulled it. Run ollama pull x first; ollama list shows what you have. |
| Out of memory / crash | Same fix, use a smaller model; close other heavy apps. |
No API key here, and that's the point. Local models cost nothing per call and keep data on your machine. The trade-off is capability and speed: a small local model won't match a frontier hosted model, and your laptop is slower than a data center. Use local for privacy/offline/cost; hosted for top quality (this is the M0 "how to choose a model" decision, made real).