Lab: M13: run an open model on your own machine
You'll need: your laptop with a terminal and your Python venv. New installs: Ollama and
requests. No API key. Time: ~50 min • Work in your breakout pair.
Heads up: use a small model (this lab uses
llama3.2). Big models are slow or won't fit on a laptop, if a reply crawls, switch togemma2:2borqwen2.5:0.5b. Nothing here can harm your computer, and there's no bill, it's all running locally.
This lab has two parts: - Part A: install Ollama, pull a model, chat in the terminal. - Part B: call the local model from Python, and compare local vs hosted.
flowchart LR
You["your Python (no key)"] -->|"POST localhost:11434"| Ollama["Ollama runs the model<br/>on your machine"]
Ollama --> Reply["reply, generated locally, offline"]
Part A: get a model running locally
Step 1: Install Ollama
Install it for your OS (see the Ollama guide), then:
ollama --version
Step 2: Pull a small model
ollama pull llama3.2
success. Check it with ollama list, your model
is now on your machine (a one-time ~1-2 GB download).
Step 3: Chat in the terminal (no code, no internet)
ollama run llama3.2
/bye to leave. (Try turning off Wi-Fi first, it still works.)
You should now see: the model replying in your terminal, generated entirely on your computer, no key, no internet, no bill. That's an open model running locally.
Step 4: Confirm the local API
Your code will talk to Ollama over HTTP. Check the server:
curl http://localhost:11434/api/tags
localhost:11434 is the local API
your Python will call.
Part B: call it from Python, and compare
Step 5: Set up and run the client
pip install requests
chat_local.py (from solution/) and chat_local_starter.py (from
starters/) in a folder, then:
python chat_local.py
quit to exit.
You should now see: replies from your local model, and notice there's no .env, no key
anywhere. Your Python is talking to localhost, not the internet.
Step 6: See it's the same shape as M4
Open chat_local.py and compare its request to M4's hosted call: both send a model and messages
and read back a reply. The only differences: the URL is localhost, and there's no API key.
You should now see / say: "a local model is just a model served on my own computer, same API shape, no key." You already knew how to talk to a model; now it runs on your machine.
Step 7: Local vs hosted (the real lesson)
Ask your local model and your M4 Claude chatbot the same harder question (e.g. "Write a Python function to find prime numbers and explain the trade-offs"). Compare quality and speed.
You should now see: the local model is free, private, offline but usually less capable and slower than the hosted frontier model. That trade-off, capability vs cost/privacy, is model selection (M0), now with your own evidence.
Step 8: Try a different model (finish the starter)
In chat_local_starter.py, pull and switch to another small model (ollama pull gemma2:2b, set
MODEL = "gemma2:2b"), and run it.
You should now see: a different open model answering, different style/quality. You can now run any open model on your machine with a one-line change.
Stuck? The finished client is
../solution/chat_local.py.
Your win
You ran an open model entirely on your own machine, from the terminal and from Python, for free, offline, and private, and you can explain when local beats hosted (and when it doesn't).
Post it to the chat wins board: "Chatted with Llama running on MY laptop, no key, Wi-Fi off, $0. Slower than Claude, but it's all mine "
Take-home (optional)
Skim Hugging Face (huggingface.co/models) and notice how many open models exist (and LM Studio for a GUI alternative to Ollama). Pick one model you'd try and one reason. The open ecosystem is huge, you now know how to tap it.