M24 solution: agentic RAG and research agents
A side-by-side of plain RAG (retrieve once, answer) and agentic RAG (retrieval is a tool the agent calls as many times as it needs), over a tiny synthetic corpus. The core runs offline with a mock model and a keyword search, so no API key and no tokens are needed.
Files
| File | Role |
|---|---|
corpus.py |
A small knowledge base and a search(query, k) tool (keyword similarity). Designed so one question is multi-hop: D1 says billing is run by the Payments team, D3 says who leads Payments. |
agent.py |
plain_rag(question) retrieves once then answers; agentic_rag(question) runs the M9 loop with search as a tool, searching again with a refined query and citing the documents used. Returns sources and (for agentic) the list of searches. Injectable client. |
demo_mock.py |
Runs both on a multi-hop question and shows plain RAG fall short, agentic RAG answer with citations, and the agent skip search for small talk. Start here. |
../starters/add_corpus_doc.py |
Add documents that create a three-hop question. |
Run it
# offline, free, no key (local corpus + mock model):
python demo_mock.py
# live (optional, costs a few tokens): put your key in .env first
cp ../starters/.env.example .env # then edit .env and paste your key
python agent.py
What the demo shows
| searches | sources | answer | |
|---|---|---|---|
| plain_rag | 1 (fixed) | [D1] |
states the team, cannot name the leader |
| agentic_rag | 2 (question, then "who leads Payments") | [D1, D3] |
names Dana Okafor, cites D1 and D3 |
| agentic_rag on "hello" | 0 | none | answers directly, no retrieval |
The point: the second search is only possible because the agent read the first result and refined its query. One-shot retrieval on the original question never reaches D3.
Design notes
- Retrieval is a tool, not a fixed step. That single change (the M9 loop around
search) is what makes RAG agentic: the agent decides whether, what, and how many times to retrieve. - Citations come from the document ids the agent actually retrieved, so the answer is verifiable and hallucination is easier to spot.
- Keyword search is a stand-in.
corpus.searchmatches words, not meaning. Production puts the M7 vector store (embeddings) behind the samesearch(query)tool; the loop is unchanged. That is the one swap from this teaching version to a real research agent. - Cost and guards. Each search and reasoning step is another model call (M20), so cap searches
(
max_searches, related to M22's step cap) and watch retrieval counts in observability.
Verified (offline)
- corpus: the original question retrieves only
D1; "who leads Payments" retrievesD3; a greeting retrieves nothing. - plain_rag: one retrieval, sources
[D1], answer cannot name the leader. - agentic_rag: two searches, sources
[D1, D3], answer names Dana Okafor with citations; zero searches for small talk. - All files compile;
demo_mock.pyruns end to end offline. Live runs reuse the M4 key and cost tokens.