Skip to content

M24 solution: agentic RAG and research agents

A side-by-side of plain RAG (retrieve once, answer) and agentic RAG (retrieval is a tool the agent calls as many times as it needs), over a tiny synthetic corpus. The core runs offline with a mock model and a keyword search, so no API key and no tokens are needed.

Files

File Role
corpus.py A small knowledge base and a search(query, k) tool (keyword similarity). Designed so one question is multi-hop: D1 says billing is run by the Payments team, D3 says who leads Payments.
agent.py plain_rag(question) retrieves once then answers; agentic_rag(question) runs the M9 loop with search as a tool, searching again with a refined query and citing the documents used. Returns sources and (for agentic) the list of searches. Injectable client.
demo_mock.py Runs both on a multi-hop question and shows plain RAG fall short, agentic RAG answer with citations, and the agent skip search for small talk. Start here.
../starters/add_corpus_doc.py Add documents that create a three-hop question.

Run it

# offline, free, no key (local corpus + mock model):
python demo_mock.py

# live (optional, costs a few tokens): put your key in .env first
cp ../starters/.env.example .env      # then edit .env and paste your key
python agent.py

What the demo shows

searches sources answer
plain_rag 1 (fixed) [D1] states the team, cannot name the leader
agentic_rag 2 (question, then "who leads Payments") [D1, D3] names Dana Okafor, cites D1 and D3
agentic_rag on "hello" 0 none answers directly, no retrieval

The point: the second search is only possible because the agent read the first result and refined its query. One-shot retrieval on the original question never reaches D3.

Design notes

  • Retrieval is a tool, not a fixed step. That single change (the M9 loop around search) is what makes RAG agentic: the agent decides whether, what, and how many times to retrieve.
  • Citations come from the document ids the agent actually retrieved, so the answer is verifiable and hallucination is easier to spot.
  • Keyword search is a stand-in. corpus.search matches words, not meaning. Production puts the M7 vector store (embeddings) behind the same search(query) tool; the loop is unchanged. That is the one swap from this teaching version to a real research agent.
  • Cost and guards. Each search and reasoning step is another model call (M20), so cap searches (max_searches, related to M22's step cap) and watch retrieval counts in observability.

Verified (offline)

  • corpus: the original question retrieves only D1; "who leads Payments" retrieves D3; a greeting retrieves nothing.
  • plain_rag: one retrieval, sources [D1], answer cannot name the leader.
  • agentic_rag: two searches, sources [D1, D3], answer names Dana Okafor with citations; zero searches for small talk.
  • All files compile; demo_mock.py runs end to end offline. Live runs reuse the M4 key and cost tokens.