M24 solution: agentic RAG and research agents

A side-by-side of plain RAG (retrieve once, answer) and agentic RAG (retrieval is a tool the agent calls as many times as it needs), over a tiny synthetic corpus. The core runs offline with a mock model and a keyword search, so no API key and no tokens are needed.

Files

File	Role
`corpus.py`	A small knowledge base and a `search(query, k)` tool (keyword similarity). Designed so one question is multi-hop: D1 says billing is run by the Payments team, D3 says who leads Payments.
`agent.py`	`plain_rag(question)` retrieves once then answers; `agentic_rag(question)` runs the M9 loop with `search` as a tool, searching again with a refined query and citing the documents used. Returns sources and (for agentic) the list of searches. Injectable client.
`demo_mock.py`	Runs both on a multi-hop question and shows plain RAG fall short, agentic RAG answer with citations, and the agent skip search for small talk. Start here.
`../starters/add_corpus_doc.py`	Add documents that create a three-hop question.

Run it

# offline, free, no key (local corpus + mock model):
python demo_mock.py

# live (optional, costs a few tokens): put your key in .env first
cp ../starters/.env.example .env      # then edit .env and paste your key
python agent.py

What the demo shows

	searches	sources	answer
plain_rag	1 (fixed)	`[D1]`	states the team, cannot name the leader
agentic_rag	2 (question, then "who leads Payments")	`[D1, D3]`	names Dana Okafor, cites D1 and D3
agentic_rag on "hello"	0	none	answers directly, no retrieval

The point: the second search is only possible because the agent read the first result and refined its query. One-shot retrieval on the original question never reaches D3.

Design notes

Retrieval is a tool, not a fixed step. That single change (the M9 loop around search) is what makes RAG agentic: the agent decides whether, what, and how many times to retrieve.
Citations come from the document ids the agent actually retrieved, so the answer is verifiable and hallucination is easier to spot.
Keyword search is a stand-in. corpus.search matches words, not meaning. Production puts the M7 vector store (embeddings) behind the same search(query) tool; the loop is unchanged. That is the one swap from this teaching version to a real research agent.
Cost and guards. Each search and reasoning step is another model call (M20), so cap searches (max_searches, related to M22's step cap) and watch retrieval counts in observability.

Verified (offline)

corpus: the original question retrieves only D1; "who leads Payments" retrieves D3; a greeting retrieves nothing.
plain_rag: one retrieval, sources [D1], answer cannot name the leader.
agentic_rag: two searches, sources [D1, D3], answer names Dana Okafor with citations; zero searches for small talk.
All files compile; demo_mock.py runs end to end offline. Live runs reuse the M4 key and cost tokens.