Install & config guide: Ollama (run open models locally)

Needed at M13. Ollama lets you download and run open models (Llama, Gemma, Qwen…) on your own computer: no API key, no internet once downloaded, no per-token bill, and your data never leaves your machine. It runs the model and exposes a small local web API at http://localhost:11434 that your code calls just like the hosted API in M4.

Official docs (verify they open): https://ollama.com · model list: https://ollama.com/library.

Hardware note: models run on your CPU (and GPU if you have one). Stick to small models on a laptop, llama3.2 (~1-2 GB), gemma2:2b, or qwen2.5:0.5b. Big models need lots of RAM and are slow on a laptop. A small model is perfect for learning.

Step 1: Install Ollama

macOS / Windows: download the app from https://ollama.com/download, run the installer, and launch it (it runs quietly in the background and starts the local server).

Linux:

curl -fsSL https://ollama.com/install.sh | sh

You should now see: Ollama installed. Check it in a terminal:

ollama --version

→ prints a version number.

Step 2: Pull a small model

ollama pull llama3.2

You should now see: a download progress bar, then success. (One-time download; it's cached after that.) See what you have with ollama list.

Step 3: Chat with it in the terminal (no code yet)

ollama run llama3.2

Type a message; type /bye to exit.

You should now see: the model replying in your terminal, generated entirely on your machine, no internet needed after the download.

Step 4: Confirm the local API is up

Your code talks to Ollama over HTTP. Check the server responds:

curl http://localhost:11434/api/tags

You should now see: a JSON list of your installed models. That endpoint (localhost:11434) is what chat_local.py posts to.

Step 5: Install the Python HTTP library

With your venv active:

pip install requests

You should now see: Successfully installed requests-…. Now python chat_local.py can call your local model.

Troubleshooting

Symptom	Fix
`ollama: command not found`	Installer didn't finish, or open a new terminal. On macOS/Windows make sure the Ollama app is running.
Python error "couldn't reach localhost:11434"	Ollama isn't running. Start the app, or run `ollama serve` in a terminal; confirm with `ollama list`.
Model reply is very slow	The model is too big for your machine, pull a smaller one (`gemma2:2b`, `qwen2.5:0.5b`) and set `MODEL` to it.
`model "x" not found`	You haven't pulled it. Run `ollama pull x` first; `ollama list` shows what you have.
Out of memory / crash	Same fix, use a smaller model; close other heavy apps.

No API key here, and that's the point. Local models cost nothing per call and keep data on your machine. The trade-off is capability and speed: a small local model won't match a frontier hosted model, and your laptop is slower than a data center. Use local for privacy/offline/cost; hosted for top quality (this is the M0 "how to choose a model" decision, made real).