Skip to content

M21 solution: agent memory and state

A small, dependency-free toolkit that gives an agent three kinds of memory: short-term (the conversation, on a token budget), long-term (facts saved to disk and recalled by relevance), and checkpointing (save and resume the whole state). The core runs offline with a mock model and a built-in recall, so no API key and no tokens are needed.

Files

File Role
memory.py ShortTermMemory (recent turns trimmed to a token budget) and LongTermMemory (remember / recall by shared-word similarity, plus save/load to JSON). approx_tokens estimates cost.
agent.py MemoryAgent: each turn it recalls relevant long-term facts into the system prompt, sends the short-term window plus the new message, then stores the turn. save_state / load_state checkpoint both memories. Injectable client for offline testing.
demo_mock.py Runs all three behaviors offline: short-term recall and budget trimming, long-term recall in a fresh session, and checkpoint resume. Start here.
../starters/add_memory_policy.py Upgrade short-term trimming from drop-oldest to summarize-oldest.

Run it

# offline, free, no key (mock model + built-in recall):
python demo_mock.py

# live (optional, costs a few tokens): put your key in .env first
cp ../starters/.env.example .env      # then edit .env and paste your key
python agent.py

How it works

  • Memory is prompt construction. The model stores nothing between calls. MemoryAgent.chat builds each prompt from recalled facts plus the recent window, which is the only reason the agent "remembers".
  • Short-term is budgeted. window() walks newest-first and stops at the token budget, so a long chat never blows up the context or the bill (M20). Ordering matters: the window is built before the new turn is added, so a turn never appears in its own window.
  • Long-term is retrieval. recall returns only the facts relevant to the current message, the same idea as RAG (M7) pointed at the agent's own memory. save/load make it survive a restart.
  • Checkpointing serializes both memories to JSON so a new process can resume exactly where it left off (the same role as LangGraph's checkpointers).

A deliberate limitation (and the production fix)

recall matches on shared content words (a small bag-of-words cosine with stopwords removed). This is free and offline, but it only matches when the query and the fact share words: "what is my name" finds "the user's name is Sam", while "what do you know about me" finds nothing. Production swaps this toy similarity for embeddings in a vector store (M7: Chroma, FAISS, hosted options), which match by meaning. The remember / recall shape is identical, so MemoryAgent is unchanged by the swap. We use the toy version so the lab runs with no key and no install; use a vector store for real semantic recall.

Verified

  • ShortTermMemory: window trims to the token budget and keeps the most recent turns.
  • LongTermMemory: recall ranks relevant facts and ignores irrelevant ones (Paris is not returned for a name query); save/load round-trips the facts.
  • MemoryAgent: a fresh session recalls a saved fact and injects it into the system prompt (verified by capturing what the mock model received); the reply reflects the recalled name and hobby.
  • Checkpoint: load_state restores identical long-term facts and short-term turns.
  • All files compile; demo_mock.py runs end to end offline. Live runs reuse the M4 key and cost tokens.