M28: Agent UX and streaming to a UI (Part D: Agentic Systems)

A correct answer delivered after eight seconds of blank screen feels broken. The same answer, with "searching..." then the words appearing live, feels fast and trustworthy, even if it took exactly as long. How an agent FEELS to use is part of whether it is good. Today you build the UX layer: the agent stops going silent and instead streams what it is doing (thinking, searching, writing) and the answer token by token, surfaces its citations and cost, and lets the user cancel a long run.

Today's win: an agent that streams its progress and its answer live to a UI, shows sources and cost, and can be cancelled mid-run, demonstrated offline and over a real Server-Sent Events endpoint.

Today you will

See why perceived latency matters as much as real latency
Turn the agent into an event stream: status, tool, citation, token, cost, done
Stream the answer token by token so it appears live instead of all at once
Surface citations and cost in the UI (from M24 and M20/M25)
Support cancellation, and serve the stream over Server-Sent Events (SSE) with FastAPI

Run of show (about 55 minutes)

Time	What we do
0:00	Hook: the blank screen problem
0:05	The one idea: emit events as you work, render them live (read `notes.md`)
0:12	Lab Part A: run the streamed agent and watch progress, tokens, citations, cost
0:30	Lab Part B: cancellation, then serve it over SSE and read the wire format
0:50	Show: post your streamed run
0:55	Wrap

If you get stuck

Builds on M6 (streaming), M24 (citations), M20/M25 (cost), and M11/M18 (serving). The core lab runs offline, free, no key (a streaming mock). The SSE step needs fastapi plus uvicorn (from M11).
The key trick is a generator: chat_stream yields events as it works instead of returning one blob. If a consumer stops iterating, the agent stops, that is cancellation for free.
Nothing here can harm your computer. Read events.py to see the whole event vocabulary in one place.

Optional challenge

Open starters/add_event.py and add a UX event of your own: a step-progress indicator ("step 2 of 3"), a typing indicator before the first token, or a retry notice when a transient error is recovered (M22). Small touches like these are most of what makes an agent feel polished.