Skip to content

Lab: M0: meet the models

You'll need: a web browser, and access to two AI chat assistants you can already use (most have free tiers: Claude at claude.ai, ChatGPT at chatgpt.com, Gemini at gemini.google.com, Copilot, Meta AI…). No install, no code, no API key. Time: ~20 minutes • Work in your breakout pair: compare what each of you sees.

Heads up: there are no wrong answers here. You're observing, how two models differ, and where they fall short. Noticing a model get something wrong is a success in this lab, not a failure.

flowchart LR
  Prompt["one prompt"] --> A["Model A"]
  Prompt --> B["Model B"]
  A --> Compare["you compare:<br/>tone · length · accuracy · cutoff"]
  B --> Compare

Use the worksheet in ../starters/model-comparison.md to jot answers as you go.


Step 1: Open two models side by side

Open two different AI chat assistants in two browser tabs (e.g. Claude and ChatGPT). Sign in to the free tier if asked.

You should now see: two chat boxes ready, from two different makers. (Two different models, not the same one twice.)

Step 2: Same prompt, both models

Paste this exact prompt into both:

"Explain what an API is to a curious 12-year-old, in exactly 3 sentences."

You should now see: two answers that are both about APIs but differ: in tone, length, which analogy they pick, whether they really used 3 sentences. Same task, different model, different result. (That difference is why model choice matters.)

Step 3: Make one show its knowledge cutoff

Ask one model something very recent or niche, e.g.:

"Who won the most recent Formula 1 race, and on what date?"

You should now see: one of, it says it doesn't know / has a knowledge cutoff, it gives an out-of-date answer, or it makes something up (a hallucination). Note which happened. That gap, "it doesn't know recent or private things", is exactly what RAG (M7) fixes.

Step 4: Look up each model's specs

On each maker's pricing or docs page (no login needed), find for each model: its context window (in tokens) and its rough price per million tokens. Jot them in the worksheet.

You should now see: two rows of specs, e.g. "Model A: ~200K context, \$X in / \$Y out", and a sense that bigger/newer usually costs more. You're reading a model's spec sheet like a pro.

Step 5: Choose for a job

For each scenario, pick which of your two models you'd use, and write one reason: - a) a free, quick personal chat assistant, - b) an app that must keep all data on your own computer (offline/private).

You should now see: a choice + reason for each. (Hint for b: that points toward an open-source model you run locally: a later module, rather than a hosted one.)


Your win

You can explain what AI engineering is and roughly how an LLM works, and you've compared two real models, caught a knowledge-cutoff/hallucination, and reasoned about which model fits which job.

Post it to the chat wins board: one surprising difference between your two models, e.g. "Same prompt: Claude gave a tidy 3-sentence answer, the other wrote 6. And one made up an F1 result! "

Take-home (optional)

Skim one maker's model page and notice they offer several models (a big capable one, a small fast one). Which would you pick for a high-volume, simple task, and why? (Cost vs. capability, the exact trade-off you'll make in M6.)