Notes: M4: Ship your first AI app
Three modules of Python were the runway; this is takeoff. Today your code talks to a real large language model (LLM) and you build a chatbot you fully understand. The magic word for this module is demystify: an AI app, underneath, is your M3 toolkit, a function call that sends some text and gets some text back, with a secret key for the door. No neural-network maths required to build with one.
What an LLM is, from a builder's view
A large language model is a program, running on someone else's powerful computers, that has read an enormous amount of text and learned to continue text plausibly. You give it some words (a prompt); it predicts good words to follow. That's the whole interface you need as a builder: text in → text out. Everything fancy, answering questions, writing code, role-playing a pirate, is that one trick, steered by what you put in.
You don't run the model on your laptop (it's far too big, recall M3's PyTorch box). Instead you call it over the internet through an API: an Application Programming Interface, a doorway one program opens for another. You send a request; the model's servers send a response.
flowchart LR
Code["Your Python<br/>messages + API key"] -->|HTTPS request| Model["Claude API<br/>(LLM on Anthropic's servers)"]
Model -->|response: text| Code
Code --> Screen["printed reply"]
The API key: a paid secret
To use the API you need an API key: a secret string (it looks like sk-ant-...) that does two
jobs: it proves the request is yours (authentication), and it's what your usage is billed
against. Treat it exactly like a password to a service you pay for. The whole of M4's setup is:
make a key → store it safely → prove it works.
Where the key lives matters more than anything else today. You keep it in a file named .env
(just KEY=value lines), load it at runtime with the python-dotenv library, and never write
it into your .py files or commit it to Git. Why so strict? A leaked key (e.g. pushed to a public
repo) can be found by bots in minutes and used to run up your bill. The habit, secrets in .env,
.env in .gitignore, only a placeholder .env.example in Git, is real security practice you'll
keep for every project after this. (Full steps + troubleshooting:
api-keys.md.)
The request: messages, system, and a model
A call to the model has a few parts. Here's the smallest real one:
client = anthropic.Anthropic() # reads your key from the environment
response = client.messages.create(
model="claude-opus-4-8", # WHICH model answers
max_tokens=1024, # cap on how long the reply can be
system="You are a friendly tutor.", # the model's personality + rules
messages=[{"role": "user", "content": "Hi!"}], # the conversation so far
)
print(response.content[0].text) # the reply text
messages is a list of turns, each a dictionary (your M2 dictionaries again!) with a role
("user" or "assistant") and content (the text). This list is the conversation.
- system is a special instruction that sets the model's role and rules, its personality. It's
not part of the back-and-forth; it's the standing brief. (You'll go deep on this in M5.)
- model picks which model answers (more below).
- max_tokens caps the reply length. A token is roughly ¾ of a word, the unit models read
and write in, and the unit you're billed in. (M6 explores this knob and others.)
- The reply comes back in response.content, a list of blocks; the text is content[0].text.
Why you send the whole conversation every time
Here's the idea that makes chatbots click: the API is stateless: it remembers nothing
between calls. Each request is judged only on the messages you send this time. So to make a bot
that "remembers", you keep a running messages list and append both sides every turn: the
user's message before the call, the model's reply after. Next turn you send the whole list again, so
the model sees the full history and can refer back. Forget to append the reply (the M4 lab's
deliberate bug) and your bot has amnesia.
flowchart TB
U1["append user turn"] --> C1["send WHOLE list → model"]
C1 --> R1["append assistant reply"]
R1 --> U1
Choosing a model (and managing cost)
You picked claude-opus-4-8 above, the most capable model. You can swap it for a cheaper, faster
one by changing one string. Rough current options:
| Model id | Good for | Relative cost |
|---|---|---|
claude-opus-4-8 |
hardest reasoning, best quality | highest |
claude-sonnet-4-6 |
strong all-rounder | medium |
claude-haiku-4-5 |
quick, simple, high-volume | lowest (~5× cheaper than Opus) |
For learning and lots of practice runs, claude-haiku-4-5 is gentle on your balance; reach for
Opus when quality matters most. Set a spend limit in the Console as a backstop, and remember
each small message costs a fraction of a cent. (M6 returns to model and parameter choices.)
Go deeper (optional, not needed for today's win)
- **Why `client.messages.create` and not `requests.post`?** Under the hood it *is* an HTTPS POST, but the official **SDK** (the `anthropic` library) wraps it: it builds the request, adds your key, retries on transient errors, and gives you typed objects back. Less to get wrong. - **`content` is a list** because a reply can contain more than text (e.g. tool calls in M9). With a plain chat reply, `content[0]` is the text block. - **Tokens, briefly:** models don't see letters or whole words but *tokens* (common chunks). "cat" is one token; "antidisestablishmentarianism" is several. Billing and limits are per token. - **Other providers** (OpenAI, Google, etc.) work the same way, account, key in `.env`, an SDK, `messages`-style calls. Learn the pattern once; the rest is renaming.Check yourself
Lock in today's win, answer each in your head, then reveal.
1. From a builder's point of view, what does an LLM do?
Show answer
Text in → text out. You send a prompt; it returns plausible continuing text. Everything (answers, code, role-play) is that one capability steered by your input. You call it over an API; you don't run it on your laptop.
2. Why must your API key live in .env and not in your .py file?
Show answer
It's a paid secret: proof of identity and what your usage is billed to. In code (especially
committed to Git) it can leak and be abused to run up your bill. Keep it in .env, load it with
python-dotenv, ignore .env in Git, and only commit a placeholder .env.example.
3. The API is "stateless." What does that mean for building a chatbot?
Show answer
The API remembers nothing between calls, it only sees the messages you send this request.
To make a bot that remembers, keep a running messages list and append both the user's
message and the model's reply each turn, resending the whole list. Skip appending the reply and
the bot forgets.
4. What does the system prompt do?
Show answer
It sets the model's role, personality, and rules for the whole conversation, the standing brief, separate from the user turns. Changing it (e.g. "You are a patient Python tutor") changes the bot's whole behavior without touching the rest of the code.
5. You want the same chatbot but cheaper for practice. What do you change?
Show answer
The model string: e.g. from "claude-opus-4-8" to "claude-haiku-4-5" (roughly 5×
cheaper and faster). One-line change. Also set a spend limit in the Console as a safety net.
New words (also in resources/glossary.md): large language model
(LLM), prompt, API, SDK, API key, authentication, .env, environment variable, token, max_tokens,
message, role (user/assistant), system prompt, stateless, messages.create.
Source: original, written for this course. API usage (the anthropic SDK, messages.create,
model IDs, the claude-opus-4-8 default) follows Anthropic's official Claude API documentation and
was verified against the installed SDK; the chatbot and setup flow are original. No third-party text
or figures; diagrams are original.