Lab M24: build a research agent that searches more than once

You'll need: your venv and the anthropic plus python-dotenv from M4. The core lab needs no API key and costs nothing (a small local corpus and a mock model). A live run at the end is optional. Time: about 45 minutes. Work in your breakout pair.

Heads up: this fuses RAG (M7-M8) with the agent loop (M9). The new idea is small: make search a tool the agent can call more than once. The corpus is tiny and synthetic. Nothing here can harm your computer.

This lab has two parts: - Part A: watch plain RAG fall short on a multi-hop question. - Part B: run the agentic research agent that searches, reads, searches again, and cites sources.

flowchart LR
  Q["question"] --> AG["research agent"]
  AG -->|search 1| KB[("knowledge base")]
  KB -->|D1: billing -> Payments| AG
  AG -->|search 2: who leads Payments| KB
  KB -->|D3: Payments led by Dana| AG
  AG --> ANS["answer + citations [D1, D3]"]

Part A: where plain RAG stops

Step 1: Set up

Copy the solution/ files and starters/.env.example into a folder. Activate your venv.

python -c "import anthropic, dotenv; print('deps ok')"

You should now see: deps ok. (If not: pip install anthropic python-dotenv, the M4 libraries.)

Step 2: Look at the knowledge base

Open corpus.py. Note that answering "who leads the team that runs billing?" needs two documents: D1 (billing is run by the Payments team) and D3 (Payments is led by Dana Okafor).

You should now see: no single document answers the question. You must read D1 to know that the next thing to look for is the Payments lead. That is a multi-hop question.

Step 3: Run the demo and read the plain-RAG result

python demo_mock.py

You should now see, under PLAIN RAG:

  sources: ['D1']
  answer: The billing service is run by the Payments team. I could not find who leads it.

Plain RAG retrieved once on the question, got D1, and stopped. It never learned to search for the Payments lead, so it cannot finish the answer. One retrieval is not enough for a multi-hop question.

Part B: the agentic research agent

Step 4: Read the agentic result

In the same output, look at AGENTIC RAG:

  searches: ['Who leads the team that runs the billing service?', 'who leads Payments']
  sources: ['D1', 'D3']
  answer: Dana Okafor leads the Payments team, which runs the billing service. [D1, D3]

The agent searched twice: first the original question (found D1, learned "Payments team"), then a refined query "who leads Payments" (found D3, learned "Dana Okafor"). Then it answered, citing D1 and D3. Open agent.py and read agentic_rag: search is a tool in the loop, and the agent calls it again after reading the first result.

You should now see: the second search only became possible because the agent read the first result. That is the whole point: retrieval the agent controls, not a single fixed step.

Step 5: See it decide NOT to search

In the same output, look at the last section:

==== AGENTIC RAG knows when NOT to search ====
  searches: []
  answer: Hello. Ask me about our teams and services.

For "Hello, how are you?" the agent did zero searches. A good research agent does not retrieve when retrieval would not help, which saves tokens (M20).

Step 6: Prove the second hop yourself

The agent's second search works because, after reading D1, the query now contains "Payments". Check the retrieval directly:

python -c "import corpus; print([i for i,_,_ in corpus.search('who leads Payments')])"

You should now see: a list including D3. The original question alone does not retrieve D3 (try it: corpus.search('who leads the team that runs the billing service') returns only D1). That gap is why the agent has to search twice.

Step 7 (optional, costs a few tokens): run against the real model

Put your key in .env (copy .env.example), then:

cp .env.example .env      # then edit .env and paste your key
python agent.py

You should now see: the PLAIN result missing the leader and the AGENTIC result naming Dana Okafor with citations. A real model decides its own queries; the multi-hop behaviour is the same. Steps 1 to 6 need no key.

Step 8: Show it

Post in the chat the plain answer next to the agentic answer (with its two searches and citations). One picture of why a research agent beats one-shot retrieval.

If you get stuck

ModuleNotFoundError: anthropic -> pip install anthropic python-dotenv (M4 libraries).
demo_mock.py cannot find a module -> run it from inside the folder with the solution .py files.
Agentic answer is missing the leader -> the second search did not retrieve D3; check corpus.search('who leads Payments') returns D3 and read the loop in agent.py.
ANTHROPIC_API_KEY error in Step 7 -> your .env is not named exactly .env, or the key line is wrong. See api-keys.md. Steps 1 to 6 need no key.

Check yourself

Why can't plain RAG answer "who leads the team that runs billing?"

The answer needs two documents, but you only know to search for the second (the Payments lead) after reading the first (billing is run by Payments). A single retrieval on the original question finds D1, not D3, so plain RAG stops one hop short.

What changes to make RAG "agentic"?

Retrieval becomes a tool the agent calls inside the M9 loop, so the agent can search, read, refine its query, and search again as many times as it needs, then answer, instead of retrieving exactly once.

Why do citations matter for a research agent?

They let a human verify the answer and make hallucination obvious: if the answer claims something no cited document supports, that is a red flag. Grounded-with-citations is the trustworthy shape.

When should you still prefer plain RAG?

When a single lookup answers the question (a direct fact). Agentic RAG costs more (each search and reasoning step is another model call), so save it for multi-step, exploratory, or query-refining questions.