Skip to content

M32 solution: AI support desk and AIOps

A dependency-free support desk (triage → route → SLA → escalate) and an AIOps correlator that collapses an alert storm into a few incidents. Offline, deterministic, no API key.

Files

File Role
support_desk.py Ticket, triage (keyword classifier + confidence; stand-in for an LLM classifier), route (severity → tier + SLA, unsure → human), sla_check (escalate one tier on a breach), and handle (the full path for one ticket).
aiops.py correlate (group alerts by service+symptom within a time window), noise_reduction, and summarize. The operations side: page on causes, not symptoms.
demo_mock.py A→E: triage five tickets, route them, escalate the one that missed its SLA, then collapse a 40-alert storm into incidents. Start here.
../starters/priority_queue.py Your turn: order the routed tickets by SLA urgency (most at-risk first).

Run it

# offline, free, instant, deterministic:
python demo_mock.py
No key and no .env are needed for this module.

The ideas, and how they fit the rest of the course

  • triage — read a ticket, assign a severity and a confidence. Production swaps the keyword rules for an LLM classifier (M5/M9) behind the same function; the lab keeps it deterministic so it is testable.
  • confidence gate — when triage is unsure (a vague ticket), do not trust the label: route to a human (human-in-the-loop, M14/M22). Acting confidently on a shaky classification is how desks hurt users.
  • routing + SLA — severity decides the tier (L1/L2/L3) and the response-time promise (the SLA).
  • escalation — miss the SLA and the ticket climbs a tier automatically, so nothing rots in a queue.
  • AIOps correlation — one root cause fires dozens of alerts; grouping them by cause turns 40 pages into a handful of incidents. Each correlated incident is exactly what M31 opens and runs a runbook against.

Verified (offline)

  • demo_mock.py is deterministic: T1→sev1, T2/T5→sev2, T3→sev3, and the vague T4 gets confidence 0.2 and is sent to human_review; T2 waited 90m against a 60m SLA and escalates L2→L3; the rest stay within SLA.
  • AIOps: a 40-alert storm correlates into 5 incidents (88% noise reduction): two model-api/5xx flares (separated in time), one model-api/latency, one vector-store/timeout, one auth singleton.
  • support_desk.py and aiops.py are dependency-free and import without a key. The correlated incidents feed straight into M31's incident lifecycle; the human-review path is the M22 approval idea applied to triage.