Skip to content

Lab M21: give your agent a memory

You'll need: your venv and the anthropic plus python-dotenv from M4. The core lab needs no API key and costs nothing (mock model + a tiny built-in recall). A live run at the end is optional. Time: about 45 minutes. Work in your breakout pair.

Heads up: a model remembers nothing on its own. "Memory" is just your code choosing what to put back into the next prompt. You will build three kinds: short-term (this chat), long-term (across sessions), and a checkpoint (save and resume). Nothing here can harm your computer.

This lab has two parts: - Part A: short-term memory and the token budget. - Part B: long-term memory across sessions, plus checkpoint and resume.

flowchart LR
  U["new message"] --> AG["MemoryAgent"]
  LT[("long-term facts<br/>on disk")] -->|recall relevant| AG
  ST["short-term window<br/>(recent turns, budgeted)"] --> AG
  AG -->|prompt = recalled facts + window + message| M["model"]
  M --> R["reply"]
  R --> ST

Part A: short-term memory (within one session)

Step 1: Set up

Copy the solution/ files and starters/.env.example into a folder. Activate your venv.

python -c "import anthropic, dotenv; print('deps ok')"
You should now see: deps ok. (If not: pip install anthropic python-dotenv, the M4 libraries.)

Step 2: Run the offline demo

python demo_mock.py
You should now see, under A. SHORT-TERM MEMORY:
turn 1: Sure: your name is Sam.
turn 2: Sure: your name is Sam.
short-term now holds 4 turns; 17 tokens in the window (budget 120 )
On turn 2 you asked "What is my name?" and the agent answered correctly, because turn 1 was still in its short-term window. That is memory: turn 1 was fed back into turn 2's prompt.

Step 3: Watch the budget trim old turns

The demo then shrinks the budget to 8 tokens.

You should now see:

window keeps only the most recent turns: ['Sure: your name is']
The window now holds only the newest turn; older ones are dropped to stay under budget. Open memory.py and read ShortTermMemory.window(): it walks newest-first and stops when the budget is full. This is why a long chat does not blow up your context or your bill.

Step 4: Prove it forgets past the budget

In a Python shell:

python -c "from memory import ShortTermMemory as S; m=S(token_budget=6); [m.add('user',f'turn {i} here') for i in range(4)]; print([t['content'] for t in m.window()])"
You should now see: only the last turn or two, not all four. Short-term memory is finite on purpose. (The optional challenge upgrades drop-oldest to summarize-oldest.)


Part B: long-term memory (across sessions) and checkpointing

Step 5: Recall a fact in a brand new session

Look at the demo's B. LONG-TERM MEMORY section (or rerun python demo_mock.py).

You should now see:

session 1 stored a durable fact and saved it to user_memory.json
session 2 (fresh conversation) asks: Sure: your name is Sam, and you enjoy hiking.
Session 2 is a brand new agent with an empty conversation, yet it answered correctly. It did NOT use short-term memory (there was none); it recalled the saved fact from long-term memory and put it in the prompt. Open agent.py and read _system_prompt: it calls self.long.recall(user_msg) and injects the hits.

Step 6: See recall pick relevant facts only

python -c "from memory import LongTermMemory as L; m=L(); [m.remember(f) for f in ['The user name is Sam and loves hiking.','The capital of France is Paris.']]; print(m.recall('what is my name'))"
You should now see: only the Sam fact, not the Paris fact. Recall returns what is relevant, so you spend tokens on useful context, not your whole history.

Honest note: this built-in recall matches on shared words, so "what is my name" finds "name is Sam". Ask "what do you know about me" and it finds nothing (no shared words), even though a human sees the link. Real systems use embeddings (the M7 vector store) to match by meaning. Same remember/recall shape, smarter matching. See notes.md section 3.

Step 7: Checkpoint and resume

The demo's C. CHECKPOINT section saves the agent's full state, then loads it into a brand new agent.

You should now see:

resumed agent still recalls: Sure: your name is Sam, and you enjoy hiking.
The resumed agent has the same long-term facts and short-term turns. save_state / load_state in agent.py write both memories to JSON, so the agent can pause and continue later, even after a restart. This is how long-running agents survive interruptions.

Step 8 (optional, costs a few tokens): a real conversation that remembers

Put your key in .env (copy .env.example), then:

cp .env.example .env      # then edit .env and paste your key
python agent.py
You should now see: the agent answer a follow-up using the fact you gave it, then save agent_state.json. Delete that file when done. Steps 1 to 7 need no key.

Step 9: Show it

Post in the chat the demo's B section: a fresh session greeting the user by name from long-term recall. One screenshot of an agent that actually remembers.


If you get stuck

  • ModuleNotFoundError: anthropic -> pip install anthropic python-dotenv (M4 libraries).
  • demo_mock.py cannot find memory/agent -> run it from inside the folder holding the solution .py files.
  • Recall returns nothing -> your query shares no words with the stored fact. Use a word that appears in the fact (Step 6), or read the embeddings note in notes.md.
  • ANTHROPIC_API_KEY error in Step 8 -> your .env is not named exactly .env, or the key line is wrong. See api-keys.md. Steps 1 to 7 do not need a key.

Check yourself

Where does an agent's memory actually live? Inside the model? No. The model remembers nothing between calls. Memory lives in your code (the message list and the fact store) and works only because you feed the relevant pieces back into each new prompt.
Why does short-term memory need a token budget? Because the conversation grows every turn, and you cannot resend it forever: the model has a context limit, and every token costs money and time (M20). The budget keeps the window bounded by dropping or summarizing the oldest turns.
What is the difference between short-term and long-term memory? Short-term is the current conversation, trimmed to a budget, gone when the program ends. Long-term is durable facts saved to disk and recalled by relevance across sessions. Together they let an agent hold a conversation now and remember you later.
Our recall matches shared words. What does production use, and why? Embeddings in a vector store (M7). They match by meaning, so a query and a fact can match even with no shared words. Our word-overlap version is a free, offline stand-in with the same remember/recall shape.