M21 solution: agent memory and state

A small, dependency-free toolkit that gives an agent three kinds of memory: short-term (the conversation, on a token budget), long-term (facts saved to disk and recalled by relevance), and checkpointing (save and resume the whole state). The core runs offline with a mock model and a built-in recall, so no API key and no tokens are needed.

Files

File	Role
`memory.py`	`ShortTermMemory` (recent turns trimmed to a token budget) and `LongTermMemory` (remember / recall by shared-word similarity, plus save/load to JSON). `approx_tokens` estimates cost.
`agent.py`	`MemoryAgent`: each turn it recalls relevant long-term facts into the system prompt, sends the short-term window plus the new message, then stores the turn. `save_state` / `load_state` checkpoint both memories. Injectable client for offline testing.
`demo_mock.py`	Runs all three behaviors offline: short-term recall and budget trimming, long-term recall in a fresh session, and checkpoint resume. Start here.
`../starters/add_memory_policy.py`	Upgrade short-term trimming from drop-oldest to summarize-oldest.

Run it

# offline, free, no key (mock model + built-in recall):
python demo_mock.py

# live (optional, costs a few tokens): put your key in .env first
cp ../starters/.env.example .env      # then edit .env and paste your key
python agent.py

How it works

Memory is prompt construction. The model stores nothing between calls. MemoryAgent.chat builds each prompt from recalled facts plus the recent window, which is the only reason the agent "remembers".
Short-term is budgeted. window() walks newest-first and stops at the token budget, so a long chat never blows up the context or the bill (M20). Ordering matters: the window is built before the new turn is added, so a turn never appears in its own window.
Long-term is retrieval. recall returns only the facts relevant to the current message, the same idea as RAG (M7) pointed at the agent's own memory. save/load make it survive a restart.
Checkpointing serializes both memories to JSON so a new process can resume exactly where it left off (the same role as LangGraph's checkpointers).

A deliberate limitation (and the production fix)

recall matches on shared content words (a small bag-of-words cosine with stopwords removed). This is free and offline, but it only matches when the query and the fact share words: "what is my name" finds "the user's name is Sam", while "what do you know about me" finds nothing. Production swaps this toy similarity for embeddings in a vector store (M7: Chroma, FAISS, hosted options), which match by meaning. The remember / recall shape is identical, so MemoryAgent is unchanged by the swap. We use the toy version so the lab runs with no key and no install; use a vector store for real semantic recall.

Verified

ShortTermMemory: window trims to the token budget and keeps the most recent turns.
LongTermMemory: recall ranks relevant facts and ignores irrelevant ones (Paris is not returned for a name query); save/load round-trips the facts.
MemoryAgent: a fresh session recalls a saved fact and injects it into the system prompt (verified by capturing what the mock model received); the reply reflects the recalled name and hobby.
Checkpoint: load_state restores identical long-term facts and short-term turns.
All files compile; demo_mock.py runs end to end offline. Live runs reuse the M4 key and cost tokens.