M21: Agent memory and state (Part D: Agentic Systems)

Every agent you have built so far has amnesia. Close the program and it forgets your name, your preferences, everything you told it. Real assistants remember: across the conversation, and across sessions weeks apart. Today you give an agent a memory: short-term (this conversation, kept under a budget), long-term (facts that survive a restart and come back when relevant), and the ability to save its state and resume later. You will watch a brand new session greet you by name because it remembered.

Today's win: an agent that remembers you within a chat, recalls facts about you in a brand new session, and can be paused and resumed with its memory intact, all runnable offline.

Today you will

Build short-term memory: the conversation history, trimmed to a token budget so it cannot grow forever
Build long-term memory: facts saved to disk and recalled by relevance, across sessions
Add checkpointing: save the agent's whole state and resume it later, even in a new process
See why memory is just "what you put back into the prompt", and how production tools and vector stores (M7) scale it

Run of show (about 60 minutes)

Time	What we do
0:00	Hook: the amnesiac agent problem
0:05	The one idea: memory is choosing what to put back into the next prompt (read `notes.md`)
0:12	Lab Part A: short-term memory and the token budget
0:30	Lab Part B: long-term memory across sessions, plus checkpoint and resume
0:52	Show: a fresh session that greets you by name
1:00	Wrap

If you get stuck

Builds on M7 (recall by meaning, the vector-store idea), M9 (the agent), and M20 (we keep an eye on tokens). Reuse your .env key only for the optional live run.
The core lab runs offline, free, no key (it uses a mock model and a tiny built-in recall). No new libraries to install. Nothing here can harm your computer.
The built-in recall matches on shared words, not deep meaning. That is on purpose so it runs offline; the notes explain how real embeddings (M7) do better, and when you would switch.

Optional challenge

Open starters/add_memory_policy.py and replace "drop the oldest turn" with "summarize the oldest turns into one short note". A good summary keeps the gist of a long conversation for a fraction of the tokens, which is how real long-running agents remember far more than their context window could hold.