M28 solution: agent UX and streaming to a UI
An agent that emits UX events as it works (a generator) instead of returning one final blob, rendered live and served over Server-Sent Events. The core runs offline with a streaming mock: no key, no spend.
Files
| File | Role |
|---|---|
events.py |
The event vocabulary (status, tool, citation, token, cost, done, error) and a terminal render that shows a stream like a simple UI. |
corpus.py |
A tiny knowledge base + search, so citations in the UI are real (M24). |
mockmodel.py |
A deterministic fake model with a streaming interface (turn returns a search plan or an answer as a list of chunks), so the UX runs offline. |
streaming_agent.py |
StreamingAgent.chat_stream(question): a generator that yields events as it thinks, searches, and streams the answer, then reports cost. |
app.py |
FastAPI /chat/stream that serializes the event stream as Server-Sent Events (text/event-stream). |
demo.py |
Offline tour: a streamed multi-step answer, a simple answer, and a cancellation. Start here. |
../starters/add_event.py |
Add your own UX event. |
Run it
python demo.py # offline: progress, live tokens, citations, cost, cancellation
pip install fastapi "uvicorn[standard]" # serve the stream
uvicorn app:app --reload
curl -N -X POST localhost:8000/chat/stream -H 'Content-Type: application/json' -d '{"message":"Who leads billing?"}'
How it works
- The agent is a generator.
chat_streamyields events in order: status, tool, citations, then the answer onetokenchunk at a time, then cost, then done. The renderer (terminal here, a browser in production) displays each as it arrives. - Streaming the answer is the headline perceived-speed win: words appear live (time-to-first-token) instead of after a blank wait. Real builds use the SDK streaming API (M6) behind the same events.
- Progress, citations, cost are just more event types, so the user always sees what the agent is doing, what it used (M24), and what it cost (M20/M25).
- Cancellation is free. Stop iterating and the generator closes; the agent produces no more tokens and no more cost. The demo stops after three tokens and the cost/done events never fire.
- SSE serving.
app.pywraps the stream in aStreamingResponsewithtext/event-stream; each event is adata: <json>\n\nframe a browserEventSourcecan read.
Verified (offline)
- The stream emits status -> tool -> citations -> tokens -> cost -> done; the assembled answer is
"Dana Okafor leads the Payments team, which runs billing. [D1, D3]" with citations
[D1, D3]matching. - The answer arrives as 10 separate
tokenchunks (live, not one blob); the cost event is positive. - Cancellation: stopping after 3 tokens means no
costordoneevent is produced. app.pyverified with FastAPITestClient:/healthok;/chat/streamreturns SSE frames starting with a status event and ending withdone.- All files compile. Offline via the mock; a production build wraps the real SDK streaming behind the same event stream (the UI code is unchanged).