Skip to content

M28 solution: agent UX and streaming to a UI

An agent that emits UX events as it works (a generator) instead of returning one final blob, rendered live and served over Server-Sent Events. The core runs offline with a streaming mock: no key, no spend.

Files

File Role
events.py The event vocabulary (status, tool, citation, token, cost, done, error) and a terminal render that shows a stream like a simple UI.
corpus.py A tiny knowledge base + search, so citations in the UI are real (M24).
mockmodel.py A deterministic fake model with a streaming interface (turn returns a search plan or an answer as a list of chunks), so the UX runs offline.
streaming_agent.py StreamingAgent.chat_stream(question): a generator that yields events as it thinks, searches, and streams the answer, then reports cost.
app.py FastAPI /chat/stream that serializes the event stream as Server-Sent Events (text/event-stream).
demo.py Offline tour: a streamed multi-step answer, a simple answer, and a cancellation. Start here.
../starters/add_event.py Add your own UX event.

Run it

python demo.py            # offline: progress, live tokens, citations, cost, cancellation

pip install fastapi "uvicorn[standard]"     # serve the stream
uvicorn app:app --reload
curl -N -X POST localhost:8000/chat/stream -H 'Content-Type: application/json' -d '{"message":"Who leads billing?"}'

How it works

  • The agent is a generator. chat_stream yields events in order: status, tool, citations, then the answer one token chunk at a time, then cost, then done. The renderer (terminal here, a browser in production) displays each as it arrives.
  • Streaming the answer is the headline perceived-speed win: words appear live (time-to-first-token) instead of after a blank wait. Real builds use the SDK streaming API (M6) behind the same events.
  • Progress, citations, cost are just more event types, so the user always sees what the agent is doing, what it used (M24), and what it cost (M20/M25).
  • Cancellation is free. Stop iterating and the generator closes; the agent produces no more tokens and no more cost. The demo stops after three tokens and the cost/done events never fire.
  • SSE serving. app.py wraps the stream in a StreamingResponse with text/event-stream; each event is a data: <json>\n\n frame a browser EventSource can read.

Verified (offline)

  • The stream emits status -> tool -> citations -> tokens -> cost -> done; the assembled answer is "Dana Okafor leads the Payments team, which runs billing. [D1, D3]" with citations [D1, D3] matching.
  • The answer arrives as 10 separate token chunks (live, not one blob); the cost event is positive.
  • Cancellation: stopping after 3 tokens means no cost or done event is produced.
  • app.py verified with FastAPI TestClient: /health ok; /chat/stream returns SSE frames starting with a status event and ending with done.
  • All files compile. Offline via the mock; a production build wraps the real SDK streaming behind the same event stream (the UI code is unchanged).