Skip to content

Install & config guide: Ollama (run open models locally)

Needed at M13. Ollama lets you download and run open models (Llama, Gemma, Qwen…) on your own computer: no API key, no internet once downloaded, no per-token bill, and your data never leaves your machine. It runs the model and exposes a small local web API at http://localhost:11434 that your code calls just like the hosted API in M4.

Official docs (verify they open): https://ollama.com · model list: https://ollama.com/library.

Hardware note: models run on your CPU (and GPU if you have one). Stick to small models on a laptop, llama3.2 (~1-2 GB), gemma2:2b, or qwen2.5:0.5b. Big models need lots of RAM and are slow on a laptop. A small model is perfect for learning.


Step 1: Install Ollama

  • macOS / Windows: download the app from https://ollama.com/download, run the installer, and launch it (it runs quietly in the background and starts the local server).
  • Linux:
    curl -fsSL https://ollama.com/install.sh | sh
    

You should now see: Ollama installed. Check it in a terminal:

ollama --version
→ prints a version number.

Step 2: Pull a small model

ollama pull llama3.2
You should now see: a download progress bar, then success. (One-time download; it's cached after that.) See what you have with ollama list.

Step 3: Chat with it in the terminal (no code yet)

ollama run llama3.2
Type a message; type /bye to exit.

You should now see: the model replying in your terminal, generated entirely on your machine, no internet needed after the download.

Step 4: Confirm the local API is up

Your code talks to Ollama over HTTP. Check the server responds:

curl http://localhost:11434/api/tags
You should now see: a JSON list of your installed models. That endpoint (localhost:11434) is what chat_local.py posts to.

Step 5: Install the Python HTTP library

With your venv active:

pip install requests
You should now see: Successfully installed requests-…. Now python chat_local.py can call your local model.


Troubleshooting

Symptom Fix
ollama: command not found Installer didn't finish, or open a new terminal. On macOS/Windows make sure the Ollama app is running.
Python error "couldn't reach localhost:11434" Ollama isn't running. Start the app, or run ollama serve in a terminal; confirm with ollama list.
Model reply is very slow The model is too big for your machine, pull a smaller one (gemma2:2b, qwen2.5:0.5b) and set MODEL to it.
model "x" not found You haven't pulled it. Run ollama pull x first; ollama list shows what you have.
Out of memory / crash Same fix, use a smaller model; close other heavy apps.

No API key here, and that's the point. Local models cost nothing per call and keep data on your machine. The trade-off is capability and speed: a small local model won't match a frontier hosted model, and your laptop is slower than a data center. Use local for privacy/offline/cost; hosted for top quality (this is the M0 "how to choose a model" decision, made real).