Created on 2025-07-29 05:47
Published on 2025-07-29 10:00
Incidents are supposed to be technical. A service fails. An alert fires. Engineers swarm. The issue gets mitigated. A postmortem is written. Learnings are shared. Case closed. But anyone who’s been through a high-stakes incident knows: it’s never just about the tech. Behind every Sev1 is a tangle of human dynamics blame, credit, fear, ego, power. The politics of incident management are subtle, uncomfortable, and deeply impactful. So let’s unpack this messy side of SRE that rarely shows up in runbooks.
**Incidents as Social Events**
At first glance, an incident is a disruption in service. But under the surface, it’s a disruption in narrative. Who saw the issue first? Who responded? Who didn’t? Who escalated and who didn’t? These questions are about people as much as systems.And how they’re answered affects morale, trust, visibility, and even careers.
**The Case for Keeping It Technical**
Many engineers argue incidents should stay “clean”:
Focus on root cause and resolution.
Leave politics at the door.
Use blameless retros.
Stick to facts and timelines.
This ideal keeps the conversation focused and reduces emotional baggage. It encourages openness. It keeps engineers from “covering their tracks” or throwing others under the bus. In this model, the system failed—not the people. Culture absorbs shock. Learning improves. Politics don’t enter the room.
**But Politics Sneak In Anyway**
In reality, politics are always present. And ignoring them doesn’t make them disappear—it just makes them harder to spot. Here’s how they show up:
Visibility and Credit The engineer who saves the day gets praise. The one who quietly prevented the incident may not.
Blame Avoidance Teams may deflect or minimize their role to protect themselves—or their roadmap.
Escalation Hesitancy Engineers fear waking leadership or being “wrong,” so they delay escalation—making things worse.
Narrative Shaping Postmortems get sanitized. Language softens. Impact is downplayed or redirected.
Resource Justification Some teams use incidents to argue for more headcount, better tooling, or to justify delays in deliverables.
Hierarchy Interference Leaders join calls and shift focus. Tech priorities become PR priorities. Engineers lose agency.
**Why It Matters**
These political dynamics affect more than egos. They impact:
Team trust: Do engineers feel safe speaking up?
Organizational learning: Are the real causes examined—or glossed over?
System design: Are tradeoffs discussed honestly—or spun for optics?
Burnout: Do engineers feel supported—or scapegoated?
Ignoring politics leads to shallow reviews and repeated failures.
**Real-World Example: The Missed Page**
A cloud company suffered a prolonged outage when a pager alert failed to trigger during a cascading failure. In the postmortem: The infra team pointed to the alerting team. The alerting team blamed the SRE platform. The SRE platform said the policy wasn’t configured by the app team.Leadership wanted answers. Instead of root cause, they got finger-pointing. The report was vague. Engineers were demoralized. Trust eroded. Later, an internal retrospective held privately revealed: Engineers didn’t escalate out of fear. Alerts were tuned down after a previous exec escalation. Ownership was murky across teams.The real problem wasn’t tech. It was politics. And no dashboard could fix it.
**How to Handle the Politics Well**
Acknowledge That Politics Exist Don’t pretend the room is neutral. Power dynamics, visibility, and reputation all matter.
Create Psychological Safety Engineers should feel safe speaking honestly—especially during postmortems.
Define Clear Roles During incidents: Who’s the incident commander? Who communicates externally? Who drives mitigation?
Train Leadership Execs should understand how to join calls without taking over. Their role is support—not control.
Value Prevention and Participation Celebrate near-miss detection. Praise engineers who de-escalate issues early—not just the heroes.
Use Structured Reviews Templates, timelines, and fact-based formats help minimize narrative manipulation.
Create Postmortem Moderators Neutral facilitators who can guide the discussion, prevent derailment, and surface key learnings.
Watch for “Status Gaming” Be cautious of teams who always “look good” in reviews. They might be hiding systemic issues—or avoiding real scrutiny.
**The Upside of Healthy Politics**
When handled well, incidents become accelerators of trust and growth.- Engineers feel seen and supported. Organizations invest in reliability. Leadership learns about system fragility. Cultural debt gets paid down alongside tech debt.Politics don’t have to be toxic. They can be a signal of misalignment, of fear, or of growth opportunity.
**Final Thought**
You can’t remove politics from incident management.But you can choose how to navigate them. Do you pretend they don’t exist while engineers whisper in side channels?Or do you name them, address them, and use them to build a healthier, more resilient team?Because systems fail. That’s what they do.But how we respond technically and socially is what determines if we’re just fixing code……or building a culture that doesn’t break under pressure.And that starts with managing not just the incident—but the humans in the room.