M21 solution: agent memory and state
A small, dependency-free toolkit that gives an agent three kinds of memory: short-term (the conversation, on a token budget), long-term (facts saved to disk and recalled by relevance), and checkpointing (save and resume the whole state). The core runs offline with a mock model and a built-in recall, so no API key and no tokens are needed.
Files
| File | Role |
|---|---|
memory.py |
ShortTermMemory (recent turns trimmed to a token budget) and LongTermMemory (remember / recall by shared-word similarity, plus save/load to JSON). approx_tokens estimates cost. |
agent.py |
MemoryAgent: each turn it recalls relevant long-term facts into the system prompt, sends the short-term window plus the new message, then stores the turn. save_state / load_state checkpoint both memories. Injectable client for offline testing. |
demo_mock.py |
Runs all three behaviors offline: short-term recall and budget trimming, long-term recall in a fresh session, and checkpoint resume. Start here. |
../starters/add_memory_policy.py |
Upgrade short-term trimming from drop-oldest to summarize-oldest. |
Run it
# offline, free, no key (mock model + built-in recall):
python demo_mock.py
# live (optional, costs a few tokens): put your key in .env first
cp ../starters/.env.example .env # then edit .env and paste your key
python agent.py
How it works
- Memory is prompt construction. The model stores nothing between calls.
MemoryAgent.chatbuilds each prompt from recalled facts plus the recent window, which is the only reason the agent "remembers". - Short-term is budgeted.
window()walks newest-first and stops at the token budget, so a long chat never blows up the context or the bill (M20). Ordering matters: the window is built before the new turn is added, so a turn never appears in its own window. - Long-term is retrieval.
recallreturns only the facts relevant to the current message, the same idea as RAG (M7) pointed at the agent's own memory.save/loadmake it survive a restart. - Checkpointing serializes both memories to JSON so a new process can resume exactly where it left off (the same role as LangGraph's checkpointers).
A deliberate limitation (and the production fix)
recall matches on shared content words (a small bag-of-words cosine with stopwords removed). This is
free and offline, but it only matches when the query and the fact share words: "what is my name" finds
"the user's name is Sam", while "what do you know about me" finds nothing. Production swaps this toy
similarity for embeddings in a vector store (M7: Chroma, FAISS, hosted options), which match by
meaning. The remember / recall shape is identical, so MemoryAgent is unchanged by the swap. We use
the toy version so the lab runs with no key and no install; use a vector store for real semantic recall.
Verified
ShortTermMemory: window trims to the token budget and keeps the most recent turns.LongTermMemory: recall ranks relevant facts and ignores irrelevant ones (Paris is not returned for a name query);save/loadround-trips the facts.MemoryAgent: a fresh session recalls a saved fact and injects it into the system prompt (verified by capturing what the mock model received); the reply reflects the recalled name and hobby.- Checkpoint:
load_staterestores identical long-term facts and short-term turns. - All files compile;
demo_mock.pyruns end to end offline. Live runs reuse the M4 key and cost tokens.