M21: Agent memory and state (Part D: Agentic Systems)
Every agent you have built so far has amnesia. Close the program and it forgets your name, your preferences, everything you told it. Real assistants remember: across the conversation, and across sessions weeks apart. Today you give an agent a memory: short-term (this conversation, kept under a budget), long-term (facts that survive a restart and come back when relevant), and the ability to save its state and resume later. You will watch a brand new session greet you by name because it remembered.
Today's win: an agent that remembers you within a chat, recalls facts about you in a brand new session, and can be paused and resumed with its memory intact, all runnable offline.
Today you will
- Build short-term memory: the conversation history, trimmed to a token budget so it cannot grow forever
- Build long-term memory: facts saved to disk and recalled by relevance, across sessions
- Add checkpointing: save the agent's whole state and resume it later, even in a new process
- See why memory is just "what you put back into the prompt", and how production tools and vector stores (M7) scale it
Run of show (about 60 minutes)
| Time | What we do |
|---|---|
| 0:00 | Hook: the amnesiac agent problem |
| 0:05 | The one idea: memory is choosing what to put back into the next prompt (read notes.md) |
| 0:12 | Lab Part A: short-term memory and the token budget |
| 0:30 | Lab Part B: long-term memory across sessions, plus checkpoint and resume |
| 0:52 | Show: a fresh session that greets you by name |
| 1:00 | Wrap |
If you get stuck
- Builds on M7 (recall by meaning, the vector-store idea), M9 (the agent), and M20 (we keep an eye on tokens). Reuse your
.envkey only for the optional live run. - The core lab runs offline, free, no key (it uses a mock model and a tiny built-in recall). No new libraries to install. Nothing here can harm your computer.
- The built-in recall matches on shared words, not deep meaning. That is on purpose so it runs offline; the notes explain how real embeddings (M7) do better, and when you would switch.
Optional challenge
Open starters/add_memory_policy.py and replace "drop the oldest
turn" with "summarize the oldest turns into one short note". A good summary keeps the gist of a long
conversation for a fraction of the tokens, which is how real long-running agents remember far more
than their context window could hold.