Created on 2025-04-14 07:44
Published on 2025-04-28 10:00
The day the infrastructure team at a mid-size SaaS company rebranded itself as “SRE” was the day everything—and yet nothing—changed.
The nameplates changed. The Slack channel names changed. The job titles in the org chart changed. But for the engineers themselves? The pager was still on. The ticket queue was still long. And the feeling of being “just ops” remained.
Welcome to the world of “SRE as Ops 2.0”—a label that stirs up more debate than most titles in tech. For some, it represents a meaningful evolution in how we manage production systems. For others, it’s little more than operations with a new badge. So, what’s the truth?To understand the controversy, we need to look at what SRE was supposed to be.
Site Reliability Engineering wasn’t created to be “ops with code.” It was built to replace traditional operations with engineering rigor. Born at Google, SRE was a radical concept: instead of humans babysitting infrastructure, software engineers would own reliability. They’d write code to automate operational tasks, define service-level objectives (SLOs), manage risk using error budgets, and constantly reduce toil.
In this vision, SREs aren’t gatekeepers—they’re system builders. They reduce human intervention. They balance innovation with stability. And they collaborate with dev teams, not just clean up after them.But outside of Google, things got messy.
As companies rushed to adopt SRE—often without fully understanding its principles—they ended up creating something halfway. They gave the old ops team a few Python courses, renamed them “SRE,” and hoped for magic. The result? Same workflows, same firefighting, just under a shinier title. Engineers felt burned. Expectations weren’t met. And soon, “SRE” became synonymous with “Ops 2.0.”So, is that bad?Let’s look at both sides.
**The argument for SRE as Ops 2.0**
There’s nothing inherently wrong with operations. Keeping systems online, monitoring performance, resolving incidents—these are critical responsibilities. Rebranding ops as SRE acknowledges the growing complexity of modern infrastructure. It reflects the fact that ops today often involves writing infrastructure-as-code, maintaining CI/CD pipelines, and managing cloud-native environments.
In that light, “Ops 2.0” is a step up. It’s recognition that the job has evolved. It’s saying: “We’re not just rebooting servers anymore. We’re building platforms.”For many companies, that’s a meaningful transition. The team might not be writing SLOs or tracking toil yet, but they’re on the path. Calling them “SRE” is aspirational. It’s a signal of intent.
**The argument against SRE as Ops 2.0**
On the other hand, calling a traditional ops team “SRE” without changing how they work dilutes the term. It sets up the team—and their stakeholders—for disappointment. Dev teams expect proactive tooling and guidance. Leadership expects data-driven reliability engineering. But if the team is still reacting to tickets and manually patching servers, those expectations won’t be met.Worse, it burns out the engineers. They’re promised a strategic role but handed a support queue. They’re expected to automate but never given time to do it. They’re measured on uptime but excluded from design discussions.
This isn’t just bad for morale. It’s bad for systems. It leads to siloed thinking, fragile tooling, and reactive cultures.
So what’s the middle path?
**Start with clarity**
If you’re going to call a team “SRE,” be clear about what that means. Are they responsible for defining SLIs/SLOs? Do they own postmortems? Are they embedded in dev teams, or are they centralized? What percentage of their time is spent on toil versus engineering?
Define the expectations. If you’re in transition, call it out. Say, “We’re an ops team moving toward SRE practices.” That honesty builds trust.
**Invest in transformation**
You can’t go from ticket-driven ops to proactive reliability engineering overnight. It takes time, tools, and leadership support. Start by identifying high-toil areas and automating them. Then move into observability, incident response frameworks, and eventually, reliability metrics. Bring devs into the conversation early. SRE works best when it’s a partnership, not a silo.
**Measure the right things**
If you want your SRE team to behave differently than ops, you need to measure different outcomes. Don’t just track uptime or MTTR. Track toil reduction. Track incidents prevented. Track adoption of reliability standards. And reward long-term thinking.
**Change culture, not just titles**
The biggest shift from ops to SRE isn’t about skills—it’s about mindset. SREs don’t just respond to incidents—they prevent them. They don’t just take orders—they co-own systems. They’re not the cleanup crew—they’re part of the build process.
Changing that mindset across engineering takes time. It starts with leadership. It spreads through collaboration. And it’s cemented by celebrating the right wins.
**A personal take**
I’ve been on both sides of this debate. I’ve worked on teams where SRE meant building resilient platforms with deep dev engagement. And I’ve worked on teams where “SRE” was just a pager with a different ringtone.
Here’s what I’ve learned: the label only matters if it matches the mission.If you’re building tools, defining standards, improving systems—call it SRE. If you’re triaging tickets and putting out fires, that’s ops. Both are valid. Both are needed. But don’t confuse one for the other.Because when you call ops “SRE” without enabling them to do SRE work, you don’t elevate the team—you undercut them.
In the end, “Ops 2.0” isn’t a destination. It’s a fork in the road.
One path leads to evolution. The other leads to rebranding.Choose wisely.