Notes: M4: Ship your first AI app

Three modules of Python were the runway; this is takeoff. Today your code talks to a real large language model (LLM) and you build a chatbot you fully understand. The magic word for this module is demystify: an AI app, underneath, is your M3 toolkit, a function call that sends some text and gets some text back, with a secret key for the door. No neural-network maths required to build with one.

What an LLM is, from a builder's view

A large language model is a program, running on someone else's powerful computers, that has read an enormous amount of text and learned to continue text plausibly. You give it some words (a prompt); it predicts good words to follow. That's the whole interface you need as a builder: text in → text out. Everything fancy, answering questions, writing code, role-playing a pirate, is that one trick, steered by what you put in.

You don't run the model on your laptop (it's far too big, recall M3's PyTorch box). Instead you call it over the internet through an API: an Application Programming Interface, a doorway one program opens for another. You send a request; the model's servers send a response.

flowchart LR
  Code["Your Python<br/>messages + API key"] -->|HTTPS request| Model["Claude API<br/>(LLM on Anthropic's servers)"]
  Model -->|response: text| Code
  Code --> Screen["printed reply"]

The API key: a paid secret

To use the API you need an API key: a secret string (it looks like sk-ant-...) that does two jobs: it proves the request is yours (authentication), and it's what your usage is billed against. Treat it exactly like a password to a service you pay for. The whole of M4's setup is: make a key → store it safely → prove it works.

Where the key lives matters more than anything else today. You keep it in a file named .env (just KEY=value lines), load it at runtime with the python-dotenv library, and never write it into your .py files or commit it to Git. Why so strict? A leaked key (e.g. pushed to a public repo) can be found by bots in minutes and used to run up your bill. The habit, secrets in .env, .env in .gitignore, only a placeholder .env.example in Git, is real security practice you'll keep for every project after this. (Full steps + troubleshooting: api-keys.md.)

The request: messages, system, and a model

A call to the model has a few parts. Here's the smallest real one:

client = anthropic.Anthropic()          # reads your key from the environment
response = client.messages.create(
    model="claude-opus-4-8",            # WHICH model answers
    max_tokens=1024,                    # cap on how long the reply can be
    system="You are a friendly tutor.", # the model's personality + rules
    messages=[{"role": "user", "content": "Hi!"}],   # the conversation so far
)
print(response.content[0].text)         # the reply text

- messages is a list of turns, each a dictionary (your M2 dictionaries again!) with a role ("user" or "assistant") and content (the text). This list is the conversation. - system is a special instruction that sets the model's role and rules, its personality. It's not part of the back-and-forth; it's the standing brief. (You'll go deep on this in M5.) - model picks which model answers (more below). - max_tokens caps the reply length. A token is roughly ¾ of a word, the unit models read and write in, and the unit you're billed in. (M6 explores this knob and others.) - The reply comes back in response.content, a list of blocks; the text is content[0].text.

Why you send the whole conversation every time

Here's the idea that makes chatbots click: the API is stateless: it remembers nothing between calls. Each request is judged only on the messages you send this time. So to make a bot that "remembers", you keep a running messages list and append both sides every turn: the user's message before the call, the model's reply after. Next turn you send the whole list again, so the model sees the full history and can refer back. Forget to append the reply (the M4 lab's deliberate bug) and your bot has amnesia.

flowchart TB
  U1["append user turn"] --> C1["send WHOLE list → model"]
  C1 --> R1["append assistant reply"]
  R1 --> U1

Choosing a model (and managing cost)

You picked claude-opus-4-8 above, the most capable model. You can swap it for a cheaper, faster one by changing one string. Rough current options:

Model id	Good for	Relative cost
`claude-opus-4-8`	hardest reasoning, best quality	highest
`claude-sonnet-4-6`	strong all-rounder	medium
`claude-haiku-4-5`	quick, simple, high-volume	lowest (~5× cheaper than Opus)

For learning and lots of practice runs, claude-haiku-4-5 is gentle on your balance; reach for Opus when quality matters most. Set a spend limit in the Console as a backstop, and remember each small message costs a fraction of a cent. (M6 returns to model and parameter choices.)

Go deeper (optional, not needed for today's win)

- **Why `client.messages.create` and not `requests.post`?** Under the hood it *is* an HTTPS POST, but the official **SDK** (the `anthropic` library) wraps it: it builds the request, adds your key, retries on transient errors, and gives you typed objects back. Less to get wrong. - **`content` is a list** because a reply can contain more than text (e.g. tool calls in M9). With a plain chat reply, `content[0]` is the text block. - **Tokens, briefly:** models don't see letters or whole words but *tokens* (common chunks). "cat" is one token; "antidisestablishmentarianism" is several. Billing and limits are per token. - **Other providers** (OpenAI, Google, etc.) work the same way, account, key in `.env`, an SDK, `messages`-style calls. Learn the pattern once; the rest is renaming.

Check yourself

Lock in today's win, answer each in your head, then reveal.

1. From a builder's point of view, what does an LLM do?

Show answer

Text in → text out. You send a prompt; it returns plausible continuing text. Everything (answers, code, role-play) is that one capability steered by your input. You call it over an API; you don't run it on your laptop.

2. Why must your API key live in .env and not in your .py file?

Show answer

It's a paid secret: proof of identity and what your usage is billed to. In code (especially committed to Git) it can leak and be abused to run up your bill. Keep it in .env, load it with python-dotenv, ignore .env in Git, and only commit a placeholder .env.example.

3. The API is "stateless." What does that mean for building a chatbot?

Show answer

The API remembers nothing between calls, it only sees the messages you send this request. To make a bot that remembers, keep a running messages list and append both the user's message and the model's reply each turn, resending the whole list. Skip appending the reply and the bot forgets.

4. What does the system prompt do?

Show answer

It sets the model's role, personality, and rules for the whole conversation, the standing brief, separate from the user turns. Changing it (e.g. "You are a patient Python tutor") changes the bot's whole behavior without touching the rest of the code.

5. You want the same chatbot but cheaper for practice. What do you change?

Show answer

The model string: e.g. from "claude-opus-4-8" to "claude-haiku-4-5" (roughly 5× cheaper and faster). One-line change. Also set a spend limit in the Console as a safety net.

New words (also in resources/glossary.md): large language model (LLM), prompt, API, SDK, API key, authentication, .env, environment variable, token, max_tokens, message, role (user/assistant), system prompt, stateless, messages.create.

Source: original, written for this course. API usage (the anthropic SDK, messages.create, model IDs, the claude-opus-4-8 default) follows Anthropic's official Claude API documentation and was verified against the installed SDK; the chatbot and setup flow are original. No third-party text or figures; diagrams are original.