Lab: M7: build Q&A over your own document

You'll need: your M4 setup (venv, key in .env, anthropic + python-dotenv), plus a new install: Chroma. A plain-text document you care about helps for Part B. Time: ~60 min • Work in your breakout pair.

Heads up: one new install today (Chroma). If pip install chromadb throws a build error, your Python is too new, switch to Python 3.12 (see the vector-store guide). The first run downloads a small model once. Errors are normal and safe.

This lab has two parts: - Part A: install Chroma, index the sample document, and see it answer from the document. - Part B: understand the three RAG steps, then point it at your document.

flowchart LR
  Q["your question"] --> R["1 · RETRIEVE<br/>vector store finds<br/>the most relevant chunks"]
  Doc["your document<br/>(chunked)"] --> R
  R --> A["2 · AUGMENT<br/>paste chunks into the prompt"]
  A --> G["3 · GENERATE<br/>Claude answers from them"]
  G --> Ans["grounded answer"]

Part A: get RAG running

Step 1: Install Chroma and check it

With your venv active:

pip install chromadb
python -c "import chromadb; print('Chroma', chromadb.__version__)"

You should now see: Chroma 1.x.x. (Build error mentioning pandas/numpy? Your Python is too new, use Python 3.12, then retry. Full steps in the guide.)

Step 2: Set up the folder

Put rag.py and sample_notes.txt (from solution/) and rag_starter.py (from starters/) in a folder with your M4 .env. Activate your venv.

You should now see: (.venv) and those files in the folder (ls / dir).

Step 3: Run it and ask a question

python rag.py

(The first run downloads Chroma's small embedding model, one time, needs internet.) When it says "Ask questions", type: What time does the café open?

You should now see: Split the document into 8 chunks., a [retrieved 3 relevant chunks] line, then an answer like "It opens at 7:00 AM on weekdays and 8:00 AM on weekends, and closes at 6:00 PM.", pulled straight from the document. You just did RAG.

Step 4: Prove it's grounded (not guessing)

Ask something the document doesn't cover: Who is the café's CEO?

You should now see: roughly "I don't know based on the document." The app answers only from your document and admits when the answer isn't there, that honesty is the point of RAG.

Part B: understand it, then use your own document

Step 5: Read the three steps

Open rag.py. Match each function to a RAG step: - retrieve() → 1. Retrieve (Chroma finds the closest chunks), - the Context: prompt in answer() → 2. Augment (chunks go into the prompt), - client.messages.create(...) → 3. Generate (Claude answers).

You should now see / say: "RAG = retrieve the right chunks, augment the prompt with them, generate an answer from them." Three steps, that's the whole idea.

Step 6: See meaning-based search (not keywords)

Run rag.py again and ask: Can I get my money back on a coffee?

You should now see: it finds the Refunds section and answers (same-day refund, no questions asked), even though your question never said the word "refund". The vector store matched on meaning ("money back" ≈ "refund"). That's what embeddings buy you over keyword search.

Step 7: Point it at YOUR document

Save a document you care about as a .txt file (export notes, a manual, a policy) into the folder. Open rag_starter.py, set DOC_FILE to your filename (TODO 1), then run:

python rag_starter.py

Ask it real questions you'd actually want answered.

You should now see: your own document answering your own questions. If an answer is off, look at what got retrieved, fixing that is exactly what M8 is about.

Stuck? The finished app is ../solution/rag.py. Peek only after you've tried.

Your win

You built a Q&A app over a document you chose, it retrieves the relevant parts, answers only from them, and admits when it doesn't know.

Post it to the chat wins board: a real question + answer from your doc, e.g. "Asked my lease 'how much notice to move out?' → 'Two months, per section 4.', my AI reads my documents now "

Take-home (optional)

Try changing NUM_CHUNKS in rag_starter.py from 3 to 1, then to 5, and re-ask a question that needs info from two different sections. Notice how how much you retrieve changes the answer, the first taste of M8, where we make retrieval actually good.