Reference scenario
Case: P1 incident response with structured runbook and postmortem
Production outage, SRE + on-call + engineering lead + product, 99.9% SLA commitment, customer-facing impact, executive visibility.
Before
- Response coordinated via Slack, who was doing what was unclear at minute 15
- Rollback executed without documented confirmation that it was safe to do
- Postmortem written as a Google Doc two days later, action items assigned informally in the meeting, never followed up
After
- Incident runbook opened at page: detection, triage, containment, resolution as explicit steps with owners
- Rollback required documented confirmation from on-call lead before execution
- Postmortem structured as a flow run: root cause fields required, action items assigned to named engineers with due dates