M28 solution: agent UX and streaming to a UI

An agent that emits UX events as it works (a generator) instead of returning one final blob, rendered live and served over Server-Sent Events. The core runs offline with a streaming mock: no key, no spend.

Files

File	Role
`events.py`	The event vocabulary (`status`, `tool`, `citation`, `token`, `cost`, `done`, `error`) and a terminal `render` that shows a stream like a simple UI.
`corpus.py`	A tiny knowledge base + search, so citations in the UI are real (M24).
`mockmodel.py`	A deterministic fake model with a streaming interface (`turn` returns a search plan or an answer as a list of chunks), so the UX runs offline.
`streaming_agent.py`	`StreamingAgent.chat_stream(question)`: a generator that yields events as it thinks, searches, and streams the answer, then reports cost.
`app.py`	FastAPI `/chat/stream` that serializes the event stream as Server-Sent Events (`text/event-stream`).
`demo.py`	Offline tour: a streamed multi-step answer, a simple answer, and a cancellation. Start here.
`../starters/add_event.py`	Add your own UX event.

Run it

python demo.py            # offline: progress, live tokens, citations, cost, cancellation

pip install fastapi "uvicorn[standard]"     # serve the stream
uvicorn app:app --reload
curl -N -X POST localhost:8000/chat/stream -H 'Content-Type: application/json' -d '{"message":"Who leads billing?"}'

How it works

The agent is a generator. chat_stream yields events in order: status, tool, citations, then the answer one token chunk at a time, then cost, then done. The renderer (terminal here, a browser in production) displays each as it arrives.
Streaming the answer is the headline perceived-speed win: words appear live (time-to-first-token) instead of after a blank wait. Real builds use the SDK streaming API (M6) behind the same events.
Progress, citations, cost are just more event types, so the user always sees what the agent is doing, what it used (M24), and what it cost (M20/M25).
Cancellation is free. Stop iterating and the generator closes; the agent produces no more tokens and no more cost. The demo stops after three tokens and the cost/done events never fire.
SSE serving. app.py wraps the stream in a StreamingResponse with text/event-stream; each event is a data: <json>\n\n frame a browser EventSource can read.

Verified (offline)

The stream emits status -> tool -> citations -> tokens -> cost -> done; the assembled answer is "Dana Okafor leads the Payments team, which runs billing. [D1, D3]" with citations [D1, D3] matching.
The answer arrives as 10 separate token chunks (live, not one blob); the cost event is positive.
Cancellation: stopping after 3 tokens means no cost or done event is produced.
app.py verified with FastAPI TestClient: /health ok; /chat/stream returns SSE frames starting with a status event and ending with done.
All files compile. Offline via the mock; a production build wraps the real SDK streaming behind the same event stream (the UI code is unchanged).