Skip to content

Lab: M13: run an open model on your own machine

You'll need: your laptop with a terminal and your Python venv. New installs: Ollama and requests. No API key. Time: ~50 min • Work in your breakout pair.

Heads up: use a small model (this lab uses llama3.2). Big models are slow or won't fit on a laptop, if a reply crawls, switch to gemma2:2b or qwen2.5:0.5b. Nothing here can harm your computer, and there's no bill, it's all running locally.

This lab has two parts: - Part A: install Ollama, pull a model, chat in the terminal. - Part B: call the local model from Python, and compare local vs hosted.

flowchart LR
  You["your Python (no key)"] -->|"POST localhost:11434"| Ollama["Ollama runs the model<br/>on your machine"]
  Ollama --> Reply["reply, generated locally, offline"]

Part A: get a model running locally

Step 1: Install Ollama

Install it for your OS (see the Ollama guide), then:

ollama --version
You should now see: a version number, Ollama is installed and its background server is running.

Step 2: Pull a small model

ollama pull llama3.2
You should now see: a download bar, then success. Check it with ollama list, your model is now on your machine (a one-time ~1-2 GB download).

Step 3: Chat in the terminal (no code, no internet)

ollama run llama3.2
Ask it something; type /bye to leave. (Try turning off Wi-Fi first, it still works.)

You should now see: the model replying in your terminal, generated entirely on your computer, no key, no internet, no bill. That's an open model running locally.

Step 4: Confirm the local API

Your code will talk to Ollama over HTTP. Check the server:

curl http://localhost:11434/api/tags
You should now see: a JSON list of your installed models. localhost:11434 is the local API your Python will call.


Part B: call it from Python, and compare

Step 5: Set up and run the client

pip install requests
Put chat_local.py (from solution/) and chat_local_starter.py (from starters/) in a folder, then:
python chat_local.py
Ask it a few things; type quit to exit.

You should now see: replies from your local model, and notice there's no .env, no key anywhere. Your Python is talking to localhost, not the internet.

Step 6: See it's the same shape as M4

Open chat_local.py and compare its request to M4's hosted call: both send a model and messages and read back a reply. The only differences: the URL is localhost, and there's no API key.

You should now see / say: "a local model is just a model served on my own computer, same API shape, no key." You already knew how to talk to a model; now it runs on your machine.

Step 7: Local vs hosted (the real lesson)

Ask your local model and your M4 Claude chatbot the same harder question (e.g. "Write a Python function to find prime numbers and explain the trade-offs"). Compare quality and speed.

You should now see: the local model is free, private, offline but usually less capable and slower than the hosted frontier model. That trade-off, capability vs cost/privacy, is model selection (M0), now with your own evidence.

Step 8: Try a different model (finish the starter)

In chat_local_starter.py, pull and switch to another small model (ollama pull gemma2:2b, set MODEL = "gemma2:2b"), and run it.

You should now see: a different open model answering, different style/quality. You can now run any open model on your machine with a one-line change.

Stuck? The finished client is ../solution/chat_local.py.


Your win

You ran an open model entirely on your own machine, from the terminal and from Python, for free, offline, and private, and you can explain when local beats hosted (and when it doesn't).

Post it to the chat wins board: "Chatted with Llama running on MY laptop, no key, Wi-Fi off, $0. Slower than Claude, but it's all mine "

Take-home (optional)

Skim Hugging Face (huggingface.co/models) and notice how many open models exist (and LM Studio for a GUI alternative to Ollama). Pick one model you'd try and one reason. The open ecosystem is huge, you now know how to tap it.