Glossary: Course 02 (AI Engineering)
Plain-language definitions, in the words a first-day beginner would understand. Every new term introduced in a module gets a line here. Kept alphabetical. The (M#) tag says which module introduced it.
12-factor config : the practice of reading all configuration from environment variables rather than hardcoding it, so the same build runs unchanged across dev, staging, and production (only env vars differ) and secrets never live in the code or repo. (M29)
A/B test (prompts): running two prompts on the same input to see which gives the better output; the core way to tell whether a prompt change actually helped. (M5)
Accountability: being answerable for what an AI system does: logging decisions, enabling review/appeal, keeping a human responsible. (M14)
Activate (a virtual environment): the command that tells your terminal to use this project's library box; afterwards your prompt shows (.venv). (M3)
Agent: a model plus tools plus a loop: it can call functions to take actions, read the results, and decide the next step (not just produce text). (M9)
Agent framework: a library that writes the agent loop for you (ask → call tool → feed result back → repeat) and adds conveniences: memory, retries, streaming, multi-agent coordination, tracing. Examples: LangGraph, CrewAI, AutoGen, smolagents, LlamaIndex, the OpenAI Agents SDK, the Claude Agent SDK. All wrap the same loop you can write by hand (M9). (M19)
Agentic RAG : retrieval-augmented generation where retrieval is a TOOL the agent calls inside its loop, rather than a single fixed step. The agent decides whether to search, reads the results, refines its query and searches again if needed, then answers. Handles multi-hop and exploratory questions that one-shot RAG (M7-M8) cannot. (M24)
AgentOps : operating agents specifically in production, traces, memory, reliability, tool-use, and multi-step failure handling; the agent-shaped slice of operations support. (Part E)
AGI (artificial general intelligence): a hypothetical human-level, all-round AI. A separate debate from the practical, narrow AI you build with today. (M0)
AI engineering: building applications on top of existing AI models (via APIs): engineering the prompts, data, tools, guardrails, and deployment around the model. Contrast an ML engineer (trains/serves custom models) and a researcher (invents models). (M0)
AIOps : using software (and sometimes ML) to handle the flood of operational signals, correlating alerts, spotting anomalies, triaging, so humans focus on the few real incidents. (M32)
Alert correlation : grouping many alerts that share a cause (same service and symptom, close in time) into a single incident, so the team pages on causes, not symptoms. The core of AIOps. (M32)
Allowlist : a list of the only values an action is permitted to use (for example, the only email domains a tool may send to, or the only URLs it may fetch). A least-privilege control: everything not on the list is blocked by default, which contains the damage if an agent is hijacked. (M23)
and / or / not: logical words that combine yes/no conditions: and (both true), or (at least one true), not (flips true↔false). (M2)
Answer match rate: in a RAG eval, the share of questions whose final answer contains the right fact (including correctly saying "I don't know"). (M8)
API: Application Programming Interface: a doorway one program opens for another. You call an AI model over its API, send a request, get a response. (M4)
API key: a secret string (like sk-ant-...) that proves a request is yours and is what your usage is billed to; keep it in .env, never in code. (M4)
append(): a list method that adds an item to the end of a list: movies.append("Jaws"). (M2)
Argument: the actual value you pass into a function when you call it: in categorize(62), 62 is the argument. (M3)
Assignment (=): putting a value into a variable: name = "Ada" means "let name be Ada." It's not the "equals" of maths. (M1)
Attention: the transformer mechanism that lets a model focus on the relevant earlier tokens when predicting the next one. (M0)
Authentication: proving who you are to a service; your API key does this for AI calls. (M4)
Base64: a way to encode raw bytes (like an image) as safe text so they can travel inside a JSON request. (M12)
Batch API : an asynchronous mode that processes non-urgent requests for roughly half the price, in exchange for not being instant. Good for bulk classification, overnight evals, and backfills; a cost-for-latency trade. (M25)
Bias & fairness: models learn human biases from training data and can treat some groups unfairly; test across cases and keep a human in the loop for consequential decisions. (M0)
Bias probe: a fairness test: run the same task while changing only a sensitive attribute and compare outputs; differences are signals to investigate (not automatic proof). (M14)
Block: the group of indented lines that belong to an if, a loop, or a function; they run together. (M2)
Blocking call: a request where your program waits for the whole reply before continuing; the opposite of streaming. (M6)
Boolean (bool): a value that is either True or False; the type used for yes/no, on/off, decisions. (M1)
Burn rate : how fast an error budget is being spent. 1x uses exactly the budget over the window; 10x spends it ten times too fast. You alert on burn rate, not raw error count. (M31)
Canary release : sending a new build to a small slice of traffic (or scoring it against the live version on an eval set) before promoting it to everyone, so a bad release is caught with a tiny blast radius. (M33)
Cancellation (abort) : letting a user stop a running agent mid-task. With a streaming generator it is free: stop consuming events and the agent stops working, so no further tokens or cost are produced. In a web app a closed SSE connection or an abort signal does the same. (M28)
Capstone : a culminating project that integrates many separately-learned skills into one complete piece of work. Course 02's Part D capstone (M27) combines orchestration, agentic RAG, memory, observability, cost control, reliability, security, and an eval gate into a single deployable agent. (M27)
Catastrophic forgetting: when fine-tuning on a narrow task makes a model worse at general tasks. (M15)
Causal masking: stopping a language model from "seeing the future", it may only attend to tokens at or before the current position. (M17)
Cell: one editable box in a notebook where you type code (or notes); you run cells one at a time. (M1)
Chain-of-thought: asking a model to "think step by step" / show its working before the final answer; improves accuracy on reasoning, logic, and arithmetic. (M5)
Checkpointer / conversation memory: what gives a LangGraph agent memory of earlier turns: a MemorySaver plus a thread_id so each conversation is remembered. (M9)
Chroma: a beginner-friendly vector store: one pip install, embeds your text locally, and finds chunks by meaning. (M7)
Chunk / chunking: splitting a document into smaller pieces (e.g. paragraphs) so retrieval can return the focused, relevant bit. (M7)
Chunk overlap: letting chunks share a few words at their edges so an idea split across a boundary isn't lost. (M8)
Circuit breaker : a pattern that, after several failures in a row, stops calling a failing service entirely for a cooldown period (fails fast) instead of retrying every request, then tests one request before fully resuming. Keeps a partial outage from becoming a full one. (M22)
Citation (grounding) : naming the specific documents an answer was drawn from. Citations let a human verify the answer and make hallucination obvious (a claim no cited document supports is a red flag). The trustworthy shape for a research or question-answering agent. (M24)
Class: a blueprint for making objects. You'll use objects from libraries (like rich's Table) long before writing your own classes. (M3)
Closed (proprietary) model: a model whose weights are private; you can only use it through the maker's API (Claude, GPT, Gemini). (M13)
Coding agent: an agent specialised for software work with file/edit/bash tools (e.g. Claude Code, Cursor, Codex); same ReAct loop, curated tool surface. (M9)
Comment: a note in the code for humans, starting with #; Python ignores it when running. (M1)
Comparison operator: a symbol that asks a yes/no question and answers True/False: == (equal), != (not equal), > < >= <=. (M2)
Concatenation: joining pieces of text end to end with +, e.g. "Hi, " + name. Only works on text. (M2 also for lists: + joins two lists.) (M1)
Condition: the yes/no test an if or while checks (e.g. score >= 60); the block runs only when it's True. (M2)
Connector: a function (or service) an agent uses to reach an outside system, a threat-intel feed, a log store, a ticketing API. The kind of thing you'd expose as an MCP server (M16) so any agent can call it. (M18)
Container: a bundle of your app + its exact dependencies that runs identically anywhere; built/run with Docker. (M11)
Content block: one piece of a model's response; response.content is a list of them, and a normal reply's text is content[0].text. (M6)
Content moderation: screening user input or model output for harmful content (via a moderation API or classifier), layered with guardrails. (M14)
Context compaction: summarising older turns in a long agent/chat to free up context-window space so it can keep going. (M9)
Context engineering: deciding everything that goes into the model's context window (system prompt, examples, retrieved docs, tool results, history) and in what order, the bigger sibling of prompt engineering. (M5)
Context window: how much text (your prompt + the reply, measured in tokens) a model can hold at once, large but finite. (M0)
Continuous integration (CI) : automatically building and testing every change as it is pushed, so problems are caught immediately rather than later. For agents, CI runs the eval gate on each push (for example with GitHub Actions) and fails the build on a regression. (M26)
Cosine similarity: a common way to measure how "close" two embedding vectors are (the angle between them); how nearest-chunk search ranks results. (M7)
create_react_agent: LangGraph's ready-made builder for a tool-using (ReAct) agent; you pass a model, tools, a prompt, and optionally a checkpointer. (M9)
Cross-cutting concern : a requirement that applies across the whole system rather than to one feature, such as observability, reliability, security, or cost. In an agent these are wrapped around the core loop (the capstone shows how), not bolted on at the end. (M27)
CUDA: NVIDIA's toolkit that lets libraries run on a GPU. Only relevant if you run AI models locally; this course uses hosted APIs, so you don't need it. (M3)
Curation (data) : the judgement step that turns raw logs into useful datasets: deduping, filtering low quality, balancing common vs rare cases, and routing ambiguous records to human review. Skipping it gives garbage-in-garbage-out, especially for fine-tuning. (M30)
Data exfiltration : an attacker getting secret or sensitive data OUT of a system. For agents, a hijacked tool is the channel: emailing a key to an attacker, or hiding data in a URL or image link the agent fetches. Defenses include least privilege on tools, allowlists, and redacting secrets from outbound content. (M23)
Data flywheel : the loop where using a deployed agent produces data (real questions plus feedback) that you curate into evals and training examples to improve the agent, which gets used more, producing more data. Usage compounds into quality over time. (M30)
Data minimization: collecting and sending only the data you actually need, a core privacy practice. (M14)
Data scientist: analyses data and builds predictive models to find insights; distinct from an AI engineer, who builds apps on existing models. (M0)
def: the keyword that defines (creates) a function: def greet(name):. (M3)
Defense in depth: stacking several independent safety layers (input, output, tool guards) so a bypass in one is caught by another; no single rule is a wall. (M10)
Delimiter (in prompts): clear markers (triple quotes, tags like <email>…</email>) that separate your instructions from the user's content, so the model doesn't confuse them. (M5)
Deployment: making your app run somewhere others can use it (a web service / container), not just on your laptop. (M11)
Dictionary: a collection of labelled values written as "key": value pairs in { }, e.g. {"name": "Alice"}; you look values up by key. The shape APIs and JSON use. (M2)
Distillation: training a small model to imitate a bigger one, to get a cheaper/faster model for a narrow job. (M15)
Docker: the tool that builds and runs containers from a Dockerfile. (M11)
Dockerfile: the recipe for a container image: base image, install steps, what code to copy, and how to start the app. (M11)
.dockerignore: like .gitignore, but for Docker builds: keeps files (e.g. .env) out of the image. (M11)
Drift : a gradual change in production inputs or model behaviour that degrades quality over time; caught by online evaluation on live traffic rather than a fixed offline test set. (Part E, M35)
elif: "else if": an extra condition checked only when the if (and earlier elifs) above it were False. (M2)
Embedding: a list of numbers (a vector) that captures the meaning of a text, so similar-meaning texts get nearby vectors. The basis of semantic search. (M7)
Embedding (token / position): a learned vector for each token (its meaning) and for each position (where it sits); the transformer's input. (M17)
Endpoint: one URL + method of a web API that does one thing, e.g. POST /chat. (M11)
.env file: a small file of KEY=value lines (like your API key) that your code loads at runtime; kept out of Git so secrets don't leak. (M4)
Environment variable: a KEY=value setting your program reads from its environment (e.g. ANTHROPIC_API_KEY); loaded from .env so secrets stay out of code. (M4)
Epoch: one full pass over the training dataset during fine-tuning; more epochs = more learning (and more overfitting risk). (M15)
Error budget : the amount of failure an SLO allows (1 minus the objective). While budget remains you can ship; when it is spent you stop and fix reliability. Makes "reliable enough" a number. (M31)
Escalation : moving a ticket or page to a more senior tier (L1 to L2 to L3) when it is not handled in time, so nothing is silently dropped. (M31, M32)
EU AI Act: the EU's risk-based law regulating AI systems; one of several governance frameworks (with NIST AI RMF, ISO/IEC 42001) an AI engineer should be aware of. (M14)
Eval gate (threshold) : a pass/fail rule applied to an eval run, for example "at least 95% of cases must pass". A change that does not meet the threshold is blocked. The threshold is a product decision: stricter for critical agents, looser for fuzzier tasks. (M26)
Evaluation / eval set: a fixed list of questions with known-good answers you run after every change, turning "seems fine" into a comparable score. (M8)
Evaluation-driven development (EDD) : the practice of treating an agent's eval suite like tests: keep a versioned set of cases in the repo, run them automatically on every change, and block changes that drop quality. The agent equivalent of test-driven development; its core habit is "every bug becomes a test". (M26)
Exception: Python's word for a runtime error (like ValueError); try/except lets you catch one instead of crashing. (M3)
Excessive agency: giving an agent more power/tools/autonomy than the task needs, widening the harm if it errs or is tricked. Fix with least privilege + human approval for risky actions. (M10)
Exit code : the number a command returns when it finishes; 0 means success and any non-zero value means failure. CI passes or fails a build based on this, so an eval runner that exits non-zero when the gate fails is what lets CI block a bad merge. (M26)
Four golden signals : the few metrics worth keeping on an operations dashboard, latency, traffic, errors, and saturation (how full your capacity is), usually shown alongside SLO burn. (Part E, extends M20)
f-string: text with an f before the quotes, where anything inside { } is replaced by a variable's value, e.g. f"Hi {name}". The clean way to put values in a sentence. (M1)
Fairness: an AI app not treating people differently based on attributes that shouldn't matter (gender, race, age…); tested by probing and improved by prompts/data/oversight. (M14)
Faithfulness: whether an answer is actually supported by the retrieved source text (distinct from whether it's true); a key thing RAG evals check. (M8)
FastAPI: a Python library that turns your functions into a web API with little code, plus request validation and an auto /docs test page. (M11)
FastMCP: the helper in the mcp Python SDK that turns a decorated function (@mcp.tool()) into a published MCP tool. (M16)
Feedback signal : information about whether an agent's answer was good. Explicit signals are thumbs up/down and corrections; implicit signals are behaviour like edits, "try again", or whether the session resolved. Explicit is cleaner; implicit is far more plentiful but noisier. (M30)
Few-shot prompting: including one or more worked examples (input → ideal output) in the prompt so the model copies the pattern. Zero-shot = no examples. (M5)
File: data stored on disk that outlives your program; you read it or write it with open(). (M3)
FileNotFoundError: the exception raised when you try to open a file that doesn't exist; catch it with try/except. (M3)
Fine-tuning: further-training a model on your own examples to change its behavior at scale. Out of scope here; try a better prompt (and RAG) first. (M5, preview)
Fine-tuning (SFT): continuing to train a pre-trained model on your own (input → ideal output) examples so a style/format/task is baked in. Good for behaviour, bad for new facts (use RAG). (M15)
Fixtures (recorded responses) : saved model outputs replayed during testing so evals are deterministic, fast, and free, instead of calling the live model on every run. Let CI gate reliably; live evals then run on a schedule to catch real model drift. (M26)
Float: a number that can have decimals, like 42.50. Use float("42.5") to turn text into one. (M1)
for loop: repeats a block once for each item in a collection: for item in things:. (M2)
Function: a named, reusable block of code you can call with inputs and that can hand back a result: def categorize(amount): .... (M3)
Function calling: another name for tool calling: the model emits a structured request to run one of your functions. (M9)
Generation failure: a RAG answer is wrong even though the right chunk was retrieved (model missed/mixed it up); fix the prompt or model. Contrast retrieval failure. (M8)
Golden dataset : a fixed set of test cases (each an input plus what a correct run looks like) that you run an agent against during evaluation. The "answer key" your scorers grade against. (M20)
Google Colab: a free website that runs Python in your browser, so there's nothing to install; we use it for M1-M2. (M1)
Graceful degradation (fallback) : when an agent cannot succeed (a real outage), returning a safe, calm response or a simpler result instead of crashing. Failing predictably and safely is better than failing ugly. (M22)
Graceful shutdown (draining) : on shutdown, a service stops accepting new requests (fails its readiness probe so the load balancer stops routing to it) and lets in-flight requests finish before exiting, so a deploy or scale-down does not drop live requests. (M29)
Gradient descent: the method training uses to nudge a model's weights toward better outputs (the maths you don't need to do yourself). (M15)
Grounding: keeping a model's answer tied to provided source text (and letting it say "I don't know" when the answer isn't there), rather than guessing. (M7)
Guardrail: a check around an AI app that blocks bad inputs, outputs, or actions (input / output / tool guards). (M10)
Hallucination: when a model states something false but confident; RAG and grounding reduce it by giving (and restricting it to) real source text. (M7)
/health: a conventional endpoint that just reports the service is up; monitors and load balancers ping it. (M11)
Horizontal scaling : handling more load by running more identical replicas of a service behind a load balancer, rather than making one bigger. It requires the service to be stateless (any replica can serve any request), with shared state kept in an external store. (M29)
Hugging Face: the "GitHub of models": a hub of open models and datasets, plus the transformers library and an inference API to run them. (M13)
Human-in-the-loop: keeping a person in control of consequential actions (an agent proposes; a human approves). Essential for risky tools and security automation. (M10)
Idempotency : the property that doing the same action twice has the same effect as doing it once. It makes actions safe to retry; if an action is not idempotent (for example "send payment"), retrying it can double it, which is why risky actions need extra care. (M22)
if / else: runs one block when a condition is True, and (optionally) another block when it's False. (M2)
Image (container image): the built package Docker produces from a Dockerfile; you run an image to get a container. (M11)
Image content block: the part of a message that carries a picture (base64 data + media_type), sent alongside a text block in the same Messages API call. (M12)
Image generation: creating a new image from a text description; needs a different tool than Claude (e.g. DALL·E, Stable Diffusion). (M12)
Improper output handling: trusting model output blindly (running it as code/SQL, rendering as HTML); treat model output as untrusted input. (M10)
Incident lifecycle : the loop run for every production failure: detect, triage, mitigate (stop the bleeding), resolve (fix the root cause), learn (postmortem). (M31)
Indentation: the spaces at the start of a line (4 by convention) that tell Python which lines are "inside" an if, loop, or function. Getting it wrong gives an IndentationError. (M2)
Index: an item's position number in a list, starting at 0: movies[0] is the first item. (M2)
Indirect prompt injection : a prompt-injection attack where the malicious instructions are hidden inside content the agent reads on someone's behalf (a document, web page, tool output, email), rather than typed by the user. The main threat once an agent has tools, because the agent cannot tell the embedded instructions from the data it was asked to process. (M23)
Inference: using a trained model to get a reply (what you do on every API call); cheap and fast, unlike the one-time, expensive training. (M0)
input(): asks the person a question, waits for them to type, and gives back what they typed (always as text). (M1)
Integer (int): a whole number with no decimal part, like 36. Use int("36") to turn text into one. (M1)
Interpreter (Python interpreter): the program that reads your Python instructions and carries them out. "Running Python" means handing your code to it. (M1)
IOC (indicator of compromise): an artifact that suggests a security incident: an IP address, domain, file hash, etc. A triage agent extracts IOCs from an alert; an enrichment agent looks up their reputation via a connector. (M18)
Iterate: to go through items one at a time (what a loop does). (M2)
JSON: a text format for data that looks almost exactly like a Python dictionary/list; the language APIs (including AI APIs) speak. (M3)
json.dump / load / dumps / loads: convert between Python data and JSON: dump/load use a file; dumps/loads use a string. (M3)
Key: the label half of a dictionary pair; you use it to look up its value: student["name"]. (M2)
KeyError: the error you get when you ask a dictionary for a key it doesn't have; use .get() to avoid it. (M2)
Knowledge cutoff: the date a model's training data stops; it doesn't know events or your private data after it, which is why it can be out of date (→ RAG). (M0)
LangGraph: an agent framework that models an agent as a stateful graph and runs the tool loop for you; create_react_agent is its ready-made ReAct agent. (M9)
Large language model (LLM): a program trained on huge amounts of text that continues text plausibly; from a builder's view, text in → text out, called over an API. (M4)
Latency: how long a request takes to get a response; a key thing to monitor (slow = users leave). (M11)
Layer normalization (layer-norm): rescaling values inside a network to keep training stable. (M17)
Least privilege: give an agent (or any component) the fewest tools/permissions it needs, so a mistake or attack does the least damage. The fix for excessive agency. (M10)
Library (package): reusable code published by others that you install with pip and bring in with import (e.g. rich). (M3)
List: a collection of values in order, written in [ ], e.g. [82, 45, 91]; you reach items by their index. (M2)
LiteLLM: a router that gives many model providers one common interface; frameworks that use it (CrewAI, smolagents, the OpenAI Agents SDK) address Claude with a provider-prefixed string like anthropic/claude-opus-4-8. (M19)
Liveness probe : a health check that asks "is the process alive?" (for example GET /healthz). If it fails, the platform restarts the container. It should be cheap and not depend on downstream services, or a slow dependency makes the platform kill a healthy process. (M29)
LLM-as-judge: using a second model call to grade an answer (e.g. "is this correct and grounded?") instead of simple keyword matching. (M8, go-deeper)
LLMOps : operating LLM-based systems in production, prompts, RAG, cost, evaluation, deployment, and the data behind them; the umbrella discipline for most of Part E. (Part E)
LM Studio: a desktop GUI app for downloading, chatting with, and serving local open models (an alternative to Ollama). (M13)
Load shedding : deliberately dropping or deferring low-priority requests during overload so the critical ones still succeed, instead of letting everything degrade at once. (Part E, M35; pairs with M22)
Local vs hosted: running a model on your own machine (free, offline, private, usually less capable) vs calling a maker's API (top quality, per-token cost, data leaves your machine). (M13)
Long-term memory (agent) : facts an agent stores durably (saved to disk) and recalls by relevance across sessions, for example a user's name or preferences. The retrieval idea from M7 (RAG) pointed at the agent's own memory; production uses a vector store with embeddings. (M21)
LoRA / PEFT: parameter-efficient fine-tuning: train a few small "adapter" layers instead of all the weights, so fine-tuning an open model is cheap and fits a modest GPU. (QLoRA adds quantization.) (M15)
Loss / cross-entropy: a number measuring how wrong a model's predictions are (cross-entropy = how "surprised" it was by the true next token); training drives it down. (M17)
max_tokens: the cap on how long a model's reply can be, measured in tokens. (M4)
MCP client: the side (inside an AI app) that connects to MCP servers, discovers their tools, and calls them on the model's behalf. (M16)
MCP (Model Context Protocol): a standard (not a framework) for connecting models/apps to tools and data sources through a common interface, "USB-C for AI tools." (M9)
MCP server: a program that exposes tools (and resources/prompts) over MCP so any MCP-aware app can discover and use them; built easily with FastMCP. (M16)
Media type: the label telling the API what a file is (image/png, image/jpeg, …); must match the actual file. (M12)
Message: one turn in a conversation sent to a model: a dictionary with a role and content. (M4)
messages.create(): the Claude SDK call that sends your messages to the model and returns its reply. (M4)
Method: a function that belongs to an object, called with a dot: table.add_row(...), "hi".upper(). (M3)
ML engineer: trains, tunes, and serves custom machine-learning models on a company's own data; contrast an AI engineer (builds on existing models). (M0)
MLP (multi-layer perceptron): the small feed-forward network inside each transformer block, paired with attention. (M17)
Modality: a type of data a model works with: text, image, audio, or video. (M12)
Model card: a short document of a model/app's intended use, limits, and evaluations, a transparency practice. (M14)
Model family: a maker's line of related models (e.g. Claude, GPT, Gemini, Llama), usually offered in several sizes/capabilities. (M0)
Model routing : sending each step of a workload to the cheapest model that can do it well, a small fast model (such as Haiku) for easy steps and a strong model (such as Opus) for hard ones. Cuts both cost and latency, but routing a hard step to a weak model hurts quality, so route conservatively and verify with evals (M20). (M25)
Monitoring: watching a running app's health and cost (e.g. logging latency and token usage per request). (M11)
Multi-agent system: several agents working together (a coordinator + specialists) instead of one agent doing everything. Common shapes: sequential pipeline, router, parallel fan-out, hierarchical. Worth it when sub-jobs are genuinely distinct; otherwise one agent is cheaper. (M18)
Multi-hop question : a question whose answer requires combining facts from more than one document, where you only learn what to search for next after reading the first result (for example: billing is run by the Payments team, and the Payments team is led by Dana Okafor). Needs more than one retrieval. (M24)
Multi-step: a task that takes more than one tool call / loop iteration (e.g. enrich an indicator, then search logs, then summarize). (M9)
Multimodal: a model that handles more than text, images, audio, sometimes video, as input and/or output. (M0)
Multimodal RAG: retrieval-augmented generation over images as well as text, using vision embeddings + a vector store. (M12)
Mutable: changeable after it's made; lists and dictionaries are mutable (you can add/change/remove items). (M2)
nanoGPT: Andrej Karpathy's minimal GPT implementation/tutorial; the classic way to learn how a transformer LM is built from scratch. (M17)
Neural network: software loosely inspired by brain cells: a web of simple units with many adjustable numbers (weights) that turns inputs into outputs; the basis of LLMs. (M0)
Next-token prediction: the core thing an LLM does: repeatedly predict the next chunk of text; everything it "does" is this, steered by the prompt. (M0)
No-code / low-code agent (e.g. n8n): building an agent on a visual canvas instead of in code: drag an "AI Agent" node, attach a chat-model node and tool nodes, connect them, and run. Same loop underneath; good for automation and non-developers. (M19)
Notebook: a page of cells you run one at a time; great for learning and experimenting (Colab notebooks are these). (M1)
Object: a bundle of data with its own functions (methods); libraries hand you objects to use, like rich's Console(). (M3)
Observability : being able to see what a program (here, an agent) actually did at each step, by recording it. For agents this means a trace of every model call and tool call, with inputs, outputs, tokens, timing, and errors. You cannot debug or improve what you cannot see. (M20)
Ollama: a friendly tool to download and run open models locally; it serves a local API at localhost:11434 you call like any web API. (M13)
On-call : the rotation of who carries the pager and responds to alerts. A good pager fires only when a human must act now; everything else is a ticket. (M31)
Online evaluation : scoring a sample of live production traffic continuously (LLM-as-judge, user feedback, proxy metrics) to catch quality drift a fixed offline eval set never anticipated. Complements the M26 gate. (Part E, extends M26)
open(): opens a file for reading or writing; best used with with open(...) as f: so it closes automatically. (M3)
Open-source / open-weight model: a model whose weights are published, so you can download and run it yourself (Llama, Mistral, Gemma, Qwen). Free per call, data stays local; you supply the hardware. (M13)
OpenTelemetry : an open standard for observability (traces, metrics, logs) with a defined set of GenAI spans for AI apps. Lets you record agent traces in a vendor-neutral format that many dashboards can read. (M20)
Operator vs builder : two stances toward a system, the builder ships it ("does it work?"), the operator keeps it running ("is it still working, and how fast do we recover?"). Operations support is the operator's job. (Part E)
Orchestrator (coordinator agent): in a multi-agent system, the agent that decides which sub-agents run, in what order, and passes each one's output to the next. It coordinates rather than doing the specialist work itself. The thing you deploy in M18. (M18)
output_config (JSON schema): the request setting that hands the API a schema (field names + types) and guarantees the reply is valid JSON in that shape. (M6)
Overfitting: when a fine-tune parrots its training examples but generalizes poorly (too few or too-similar examples). (M15)
OWASP LLM Top 10: the standard catalogue of the top risks for LLM apps (prompt injection #1, sensitive-info disclosure, excessive agency, system-prompt leakage, …). (M10)
Parameter: a named input a function expects, listed in its def line: in def categorize(amount):, amount is the parameter. (M3)
Parameters (model): the billions of internal numbers a model learns during training; loosely, its "size." Bigger isn't always better for a given job. (M0)
Perceived latency : the wait a user actually experiences, as opposed to the server's total processing time. Streaming progress and the first words quickly makes an agent feel fast even when the total time is unchanged; a blank screen makes the same wait feel broken. The biggest UX lever for LLM apps. (M28)
Perplexity: a standard language-model quality measure: the exponential of the cross-entropy loss (lower = predicts text better). (M17)
PII (personally identifiable information): data that identifies a person (email, phone, ID, card number); minimise it and redact it before sending to a model. (M14)
pip: Python's tool for installing libraries: pip install rich, or pip install -r requirements.txt. (M3)
Postmortem (blameless) : the write-up after an incident that blames the system and its gaps, never a person, so people share what happened and the fix (often a new regression test) gets built. (M31)
Pre-training (training): the slow, expensive, one-time process where a model learns from huge amounts of text; done by the model maker, not by you. (M0)
print(): shows text or a value on the screen. (M1)
Program: a list of instructions a computer follows in order, top to bottom. (M1)
Prompt: the text you send a model to steer it; the words it continues from. (M4)
Prompt caching : reusing an identical, stable prefix (system instructions, examples, retrieved context) across many calls instead of paying full input price for it every time. The prefix is written to the cache once (about 1.25x normal input price) and read on later calls for about a tenth of the price. The highest-leverage, lowest-risk cost cut for agents, which reuse the same context constantly. (M25)
Prompt compression: shortening a prompt or context to use fewer tokens (cheaper/faster) without losing the important information. (M5)
Prompt engineering: shaping the instructions you give a model (role, rules, output shape, examples) so it reliably produces what you want. (M5)
Prompt injection: a malicious input that overrides a model/agent's instructions. Direct = the user types it; indirect = it hides in data the app pulls in (a web page, PDF, RAG chunk). OWASP's #1 LLM risk. (M10)
Python: the programming language this course uses; reads almost like English and is the main language of AI tools. (M1)
PyTorch: a library for training/running AI models on your own hardware. Optional for this course (we use hosted APIs); install via the official selector if you want it. (M3)
Quantization: storing a model's numbers at lower precision (e.g. 4-bit) so it uses far less memory and runs faster, for a small quality cost; how big models fit on a laptop. (M13)
RAG (retrieval-augmented generation): answer questions about your own documents by retrieving the relevant chunks, pasting them into the prompt, and having the model answer from them. (M7)
range(): produces a sequence of numbers for counting loops: for i in range(5): runs with i = 0,1,2,3,4. (M2)
Rate limit / quota : caps that protect a shared service, requests per key per minute (rate limit) and a spend or usage budget per tenant (quota), so one client cannot starve others or run away with cost. (Part E, extends M25/M29)
ReAct loop: the agent cycle: reason → act (call a tool) → observe (read the result) → repeat until done. (M9)
Readiness probe : a health check that asks "should this instance receive traffic right now?" (for example GET /readyz). It returns a failure (503) while the app is starting up or draining during shutdown, so the load balancer holds traffic back until the instance is ready. Distinct from liveness. (M29)
Red-teaming: deliberately attacking your own app (prompt injection, jailbreaks, data exfiltration) to find weaknesses before real attackers do. (M10)
Redaction: removing/masking sensitive data (e.g. PII) from text before it's sent or stored. (M14)
Reference-free metric : a quality signal computed without a known correct answer (refusal rate, citation presence, answer length, an LLM judge), used to score live traffic in online evaluation. (Part E, M35)
Regression : when a change (to a prompt, tool, model, or framework) silently breaks behaviour that used to work. Running an eval suite after every change is how you catch regressions, the same reason you run unit tests. (M20)
requirements.txt: a text file listing a project's libraries so anyone can install them all with pip install -r requirements.txt. (M3)
Reranking: a sharper second retrieval pass: fetch a broad candidate set cheaply, then re-score it with a more precise method and keep only the best few. (M8)
Research agent : an agent whose job is to gather information: it searches a knowledge base or the web, reads, follows the trail with refined queries, and produces an answer with citations. Agentic RAG is the core pattern behind it. (M24)
Residual connection: adding a layer's input to its output, a trick that lets deep networks train well. (M17)
Responsible AI: building AI systems that are safe, fair, private, and honest about uncertainty, the engineer is responsible for the whole system, not just the model. (M0)
Retention (data) : the policy for how long data is kept before it is deleted (a TTL), so a store does not grow forever or hold sensitive data past its purpose. (M33)
Retrieval: the "find the relevant chunks" step of RAG; quality of the whole app depends on it. (M7)
Retrieval failure: the right chunk was never fetched, so the model never had the facts; fix chunking, k, or reranking (not the prompt). (M8)
Retrieval hit rate: in a RAG eval, the share of answerable questions where retrieval fetched the chunk holding the answer. (M8)
Retry with exponential backoff : on a transient error, try the call again after waiting, and make the wait longer each time (for example 0.5s, 1s, 2s). Backing off gives a struggling service room to recover instead of hammering it; production also adds small random "jitter" so many clients do not retry in lockstep. (M22)
return: hands a value back from a function to whoever called it; without it a function returns None. (M3)
RLHF: reinforcement learning from human feedback: how makers train a model to be helpful and safe after fine-tuning. (M15)
Role: which side a message is from: "user" (you) or "assistant" (the model). (M4)
Rollback : returning a deploy to the last-good version. Faster and safer than a forward-fix during an incident; never ship a change you cannot undo. (M33)
Runbook : a named, ordered checklist of safe, reversible mitigations for a known failure, so whoever is on-call can act correctly without having built the system. (M31)
Scorer (metric) : a small check in an evaluation that looks at an agent's answer and/or its trace and returns pass or fail, for example "did it call the multiply tool with the right arguments?". Rule-based scorers are deterministic and free; an LLM-as-judge scorer grades open-ended answers. (M20)
SDK: Software Development Kit: a library that wraps an API in friendly functions (the anthropic SDK wraps the Claude API). (M4)
Secret rotation : replacing a credential on a schedule, keeping the previous one valid for a short grace window so in-flight clients do not break (zero-downtime rotation). (M33)
Secrets manager: a host service that stores secrets (like API keys) and injects them into a running app at deploy time, so they're never in code or the image. (M11)
Security eval set: a fixed list of attacks (plus benign controls) you run after every change; targets are 0 leaks and 0 wrongly-blocked benign inputs. (M10)
Self-attention: the transformer mechanism where each token looks at earlier tokens and weighs which matter for predicting the next one. (M17)
Semantic search: finding text by meaning (nearest embedding vectors) rather than by keyword match. (M7)
Sensitive attribute: a characteristic (gender, race, age, name…) that should NOT change an AI's decision about a person. (M14)
Sensitive-information disclosure: when an app leaks secrets, personal data, or its own instructions; an OWASP LLM risk an output guard helps prevent. (M10)
Server-Sent Events (SSE) : a simple way to stream events one-way from server to browser over a kept-open HTTP response, sent as data: <payload> lines. A browser's EventSource reads them live. The easy transport for streaming an agent's output to a web UI; use WebSockets when you also need client-to-server streaming (for example live voice). (M28)
Short-term memory (agent) : the running conversation an agent keeps within one session, trimmed to a token budget so it cannot grow forever. Lost when the program ends. Implemented as the recent message turns you feed back into each prompt. (M21)
SLA (service-level agreement) : the promise to users about service, for example a first-response time per severity. The SLO is your internal target; the SLA is the external commitment. (M32)
Slice: a piece of a list (or string) taken with [start:stop], stopping just before stop: movies[0:2]. (M2)
SLI / SLO (service-level indicator / objective) : the SLI is what you measure (for example, the percent of successful answers); the SLO is the target you hold it to (for example, 99%). Together they define "healthy" as a number. (M31)
SOC (Security Operations Center): the team that monitors and responds to security alerts; L1 triages/enriches/summarizes, L2 correlates and investigates, the tasks M9's security agent assists. (M9)
Span : one recorded step inside a trace, for example a single model call or a single tool call, with its inputs, output, token count, duration, and status (ok or error). (M20)
Speech-to-text (STT): transcribing audio into text (e.g. Whisper); a separate model/tool from Claude. (M12)
Stale index : a RAG record whose source document has changed since it was indexed, so it answers from out-of-date content until it is re-embedded. (M33)
Standard library: the batteries-included libraries that come with Python (like json), needing no pip install. (M3)
Stateless: keeps no memory between calls; the AI API is stateless, so you resend the whole conversation each turn to give it context. (M4)
stdio transport: the simplest MCP connection: the client runs the server as a subprocess and they exchange messages over standard input/output (local, no network). (M16)
Step cap (runaway loop) : a hard limit on how many steps an agent may take in one run, so a confused or manipulated agent cannot loop forever and burn unlimited tokens. Observability (M20) shows the loop; the cap prevents the cost. (M22)
stop_reason: a field on the response saying why the model stopped: "end_turn" (finished), "max_tokens" (hit your cap), and others. (M6)
Streaming: receiving a reply in chunks as it's written (so words appear live) instead of waiting for it all; via client.messages.stream(...). (M6)
String (str): text, written inside quotes: "Ada" or 'hello'. (M1)
Structured logging : writing logs as machine-parseable records (JSON with fields) rather than free text, and stamping a shared request/trace id on every line so one request's whole story can be queried (log correlation). (Part E, extends M20)
Structured output: asking a model to return data in a fixed shape (usually JSON) so your code can use it directly. Prompt-only JSON isn't guaranteed (M5); the API can guarantee it (M6). (M5)
Sub-agent (specialist agent): an agent with one focused role (a short, sharp system prompt), called by an orchestrator. Splitting a job into sub-agents makes each part easy to test, swap, and improve independently, at the cost of more LLM calls. (M18)
Support tier (L1/L2/L3) : levels of support by depth: L1 front-line (common questions), L2 technical (real bugs), L3 engineering/escalation. Triage routes each ticket to the right tier. (M32)
Syntax error: Python couldn't read a line because of a typo in its shape (a missing quote, bracket, or colon). (M1)
System prompt: a standing instruction that sets a model's role, personality, and rules for the whole conversation. (M4)
System-prompt leakage: when an app reveals its hidden system instructions (which often contain secrets or logic); a specific OWASP LLM risk. (M10)
Temperature: a model setting (0-1) for randomness: lower = more focused/repeatable, higher = more varied/creative. The newest Opus models reject it (use claude-haiku-4-5/claude-sonnet-4-6 when you need it). (M6)
Terminal: a text window where you type commands (like python or pip) instead of clicking. (M3)
Text-to-speech (TTS): turning text into spoken audio; a separate tool (e.g. ElevenLabs, OpenAI TTS). (M12)
Time-to-first-token : how long until the first piece of the answer appears. Streaming minimizes it, which is what makes a response feel responsive; the total generation time can be the same. (M28)
Timeout : giving up on a call that takes too long, so one slow dependency cannot freeze the whole agent. The caller stops waiting and treats it as a transient error (which a retry can then re-attempt). (M22)
Toil : repetitive manual operational work that scales with traffic and could be automated; reducing toil (runbooks, automation, self-healing) is a core operations-support goal. (Part E)
Token: the unit a model reads and writes in, roughly ¾ of a word; usage and limits are counted in tokens. (M4)
Token bucket : a rate-limiting algorithm, a bucket holds up to N tokens, each request spends one, and tokens refill at a fixed rate; an empty bucket means the request is rate-limited (HTTP 429). (Part E, M35)
Token budget : a cap on how many tokens of conversation an agent resends each turn. Because the model has a context limit and every token costs money and time, short-term memory drops or summarizes the oldest turns to stay under the budget. (M21)
Tool: a plain function you let a model call to take an action or fetch information (e.g. calculate, lookup_ioc). (M9)
Tool calling: the model emitting a structured request to run one of your tools (a.k.a. function calling). (M9)
Tool discovery: an MCP client asking a server "what tools do you have?" (list_tools) before calling any, the handshake that makes tools portable across apps. (M16)
Tool poisoning: an MCP/agent security risk where a malicious tool description tries to manipulate the model; review servers you didn't write. (M16)
tool_use / tool_result: the response block where the model asks to run a tool, and the message you send back with that tool's output (matched by tool_use_id). (M9)
top-k: how many nearest chunks retrieval returns per query (e.g. n_results=3). More gives context but adds noise and tokens. (M7)
Trace : the recorded history of one agent run, made of spans (one per step). Read top to bottom, it shows the agent thinking: a model call asks for a tool, the tool runs, a model call gives the final answer. The unit that tools like LangSmith, Langfuse, and OpenTelemetry record. (M20)
Training: adjusting a model's weights (via gradient descent) so its outputs improve; happens in stages (pre-training → fine-tuning → RLHF). (M15)
Transformer: the neural-network design (2017) behind modern LLMs; its "attention" mechanism weighs which earlier words matter for predicting the next one. (M0)
Transient error : a failure that is likely to succeed if you wait and try again, for example a rate limit (429), a temporary server error (503), or a timeout. Worth retrying, unlike a permanent error (such as an invalid-request 400) which fails every time. (M22)
Transparency / disclosure: telling users when they're interacting with AI and, where it matters, how a decision was made. (M14)
Triage (support) : reading an incoming ticket or alert and assigning how urgent it is and who should handle it, with a confidence so uncertain cases go to a human. (M32)
try / except: attempt risky code in try; if it raises a matching error, run the except block instead of crashing. (M3)
Tuple: like a list but immutable (can't be changed), written with ( ), e.g. (40.71, -74.00). (M2)
Type: what kind of value something is (text, number, true/false); it decides what you're allowed to do with it. (M1)
Type conversion (casting): turning a value from one type into another: float("42.5"), int("36"), str(36). (M1)
usage: token counts on the response (input/output); how you track what a call cost. (M6)
User prompt: the specific request you send each turn (the actual text to rewrite/classify/answer), as opposed to the standing system prompt. (M5)
uvicorn: the web server that runs a FastAPI app: uvicorn app:app --reload. (M11)
Value: the contents half of a dictionary pair (the data a key points to). (M2)
ValueError: the exception raised when a value is the wrong kind for an operation, e.g. float("hello"); catch it with try/except. (M3)
Variable: a named box that holds a value so you can use it again later, e.g. name = "Ada". (M1)
Vector: a list of numbers; an embedding is a vector, and "nearby" vectors mean similar meaning. (M7)
Vector store: a database that stores chunks + their embeddings and quickly returns the nearest ones to a query (we use Chroma). (M7)
Video understanding: analysing or summarising video with a model (e.g. Gemini, or sampling frames into a vision model). (M12)
Virtual environment (venv): a private box of libraries for one project, created with python -m venv .venv and switched on by activating it. (M3)
Vision / image understanding: sending a model an image (plus a question) and getting a text answer; reading, describing, or extracting from pictures. (M12)
Web API: a way for other programs to call your app over HTTP (send a request, get a response); built here with FastAPI. (M11)
Weights (model): the billions of learned numbers that are the model; "open-weight" means these are published so you can run the model yourself. (M13)
while loop: repeats a block as long as a condition stays True; you must make the condition eventually become False or it loops forever. (M2)
Zero-shot: prompting with no examples (just instructions). Contrast with few-shot, which adds worked examples. (M5)