M24 notes: Agentic RAG and research agents (the one idea)
The one idea: plain RAG retrieves once and answers. Agentic RAG makes retrieval a TOOL the agent controls, so it can decide whether to search, search again with a better query after reading the first results, and keep going until it has enough to answer, with citations. It is the M9 agent loop with a search tool, and it answers questions one-shot RAG cannot.
1. Where plain RAG stops
Recall the RAG pattern from M7-M8: take the question, retrieve the top-k matching chunks, put them in the prompt, and answer. This is fast and works well for direct lookups ("what are the office hours?"). It has three blind spots:
- Multi-hop questions. "Who leads the team that runs billing?" The answer lives in two documents: one says billing is run by the Payments team, another says who leads Payments. A single retrieval on the original question finds the first but not the second, because you do not yet know to search for "Payments lead". One shot is not enough.
- Bad first query. If the user's wording does not match the documents, the single retrieval misses, and plain RAG has no way to rephrase and try again.
- Over-retrieval. Plain RAG retrieves for every question, even "hello", wasting tokens on context that is not needed.
The lab shows the first one directly: plain RAG retrieves D1 ("billing is run by the Payments team") and then cannot name the leader, because D3 ("Payments is led by Dana Okafor") was never retrieved.
2. The fix: retrieval becomes a tool
Give the agent a search(query) tool and let the M9 loop run. Now the agent can:
- decide whether the question even needs the knowledge base (skip search for small talk),
- search, read the results, and judge whether they answer the question,
- if not, refine the query using what it just learned and search again,
- answer once it has enough, citing the documents it used.
In agent.py, agentic_rag does exactly this. On the multi-hop question it searches the original
question (finds D1, learns "Payments team"), then searches "who leads Payments" (finds D3, learns
"Dana Okafor"), then answers with sources [D1, D3]. Two hops, because it could look twice.
Analogy. Plain RAG is asking a librarian one question and reading whatever they hand back. Agentic RAG is doing research: you read the first source, it points you to a second, you go find that, and you write your answer with a list of references. The agent does the legwork of following the trail.
3. Citations are not optional
Because the agent retrieved real documents, it can (and should) say WHICH ones it used. Citations do
two jobs: they let a human verify the answer, and they make hallucination obvious (if the answer claims
something no cited document supports, that is a red flag). agentic_rag tracks every document id it
retrieved and the answer references them. Grounded-with-citations is the trustworthy shape for any
research or question-answering agent.
4. Keyword search here vs embeddings in production
To run offline and free, corpus.py searches by shared keywords (the same idea as M21's recall). That
is enough to demonstrate multi-hop behaviour, but it is brittle: it matches words, not meaning. Real
research agents put the M7 vector store (Chroma, FAISS, hosted options) behind the same
search(query) tool, so retrieval matches by meaning and the second-hop query does not have to share
exact words with the document. The agent loop is identical; only the search tool gets smarter. Swap one
function and this module becomes production-grade retrieval.
5. Costs and limits (be honest)
Agentic RAG is more powerful but not free: each search plus each reasoning step is another model call and more tokens (M20 shows the bill). Guard it the way you guard any agent:
- a step cap so it cannot search forever (M22),
- observability so you can see how many searches a question took (M20),
- and remember it can still be wrong: it can retrieve the wrong document, or stop searching too early. Citations plus evals (M20) are how you catch that.
Use plain RAG when a single lookup answers the question; reach for agentic RAG when questions are multi-step, exploratory, or need the agent to decide what to look for next.
Words you will hear
Agentic RAG, multi-hop question, query refinement / re-retrieval, retrieval as a tool, citations / grounding, research agent, vector store (M7), step cap (M22). Full definitions in the glossary.