Created on 2025-12-18 09:05
Published on 2025-12-22 11:00
If you’ve ever sat through a 90-minute Change Advisory Board that approved a one-line feature flag flip, you’ve probably sworn off ITIL forever. If you’ve also lived through a 2 a.m. outage where no one could find who owned the dying microservice, you’ve probably wished for a little more ITIL. That’s the paradox: pure process slows you down; pure velocity melts production. As an SRE, you don’t need “more process” or “less process.” You need the rightprocess, thinly sliced, automated, and fused to SRE’s feedback loops.
ITIL 4 quietly did some growing up. It replaced the old lifecycle vibe with a “service value system,” and it centers seven guiding principles like “start where you are,” “progress iteratively with feedback,” and “optimize and automate.” That’s not grandpa’s paperwork party; it’s language SREs already speak. The difference is intent. ITIL 4 gives a governance skeleton; SRE supplies the muscles, sensors, and reflexes. Used together, you can move fast and avoid breaking your customers.
The old debate assumed a binary: DevOps and SRE deliver speed; ITIL prevents chaos. ITIL 4 explicitly rejects that framing and embraces agile and DevOps ways of working. In fact, guidance co-authored by AXELOS and Atlassian says the future of ITSM is “more agile than ever,” with streamlined change control through automation and collaboration. Translation for SREs: fewer forms, more policy-as-code; fewer approvals, more risk-based gates.
Meanwhile, the world around us evolved. The latest DORA research emphasizes platform engineering, user-centricity, and stable priorities — all levers SREs use daily to tame complexity and improve throughput and stability. If you’re building paved roads, catalogs, templates, and golden paths, you’re already doing the “service value” bit of ITIL without calling it that.
Let’s address the elephant in the CAB: heavyweight, external change approvals hurt delivery performance and don’t improve stability. DORA’s data has been consistent for years on this point. If your change process is a weekly tribunal, you’re trading speed without buying safety. High performers automate change risk evaluation, route low-risk changes straight to production, and rely on guardrails, not gatekeepers.
SRE brings the missing mechanism: SLOs and error budgets. An error budget isn’t a vibe; it’s a mutually agreed ledger of “how much unreliability we can afford this period.” Breach the budget, and change freezes kick in unless they’re reliability, security, or hotfix work. That isn’t politics — it’s math-backed governance, and it maps beautifully onto “change enablement” in ITIL 4.
Camp A: “ITIL is dead; SRE won.”
These folks show scar tissue from v3-era CABs and CMDB sprawl. Their rallying cry is “you build it, you run it,” zero approvals, and “if it hurts, do it more often.” They’re not wrong that manual approvals scale poorly, and the data backs them up on the harm of heavyweight change boards. But they can drift into an allergy to any governance, which comes back to bite when audits, cross-team dependencies, or regulated workloads appear.
Camp B: “Without ITIL, chaos.”
This camp remembers the pre-ITIL wild west: undocumented runbooks, mystery queues, and shrug emojis for SLAs. They want roles, practices, and clear handoffs. Also not wrong. But when “process” becomes a cargo cult, your MTTR balloons and your developers route around you — often by creating their own shadow “platform,” which is just governance with extra steps.
Here’s the plot twist: ITIL 4 plus SRE is the synthesis. ITIL gives a shared language for intent and accountability; SRE provides the quantitative knobs and the automation to uphold that intent at speed. Even the DORA research themes — platform engineering, product thinking, stable priorities — line up with the ITIL 4 service value system when you look past the acronyms.
It’s 03:07. Your on-call channel lights up like a Christmas tree. Cart checkout is failing intermittently, customer support is reporting timeouts, and finance is nervous. You need two things instantly: a crisp SLO view to decide if you’re burning budget fast, and an owner map to find the team that actually runs the dependency that just fell over. The SRE muscle memory kicks in: error-budget burn alerts page the right people, you gate deploys automatically, and you spin up a quick post-incident doc as you go.
Now imagine that the service catalog knows the criticality, the SLOs are registered as first-class attributes, and the relationship graph actually reflects reality. Incident, problem, change, and knowledge aren’t four committees; they’re four angles on the same telemetry. That’s ITIL practices, implemented the SRE way. It’s also the only time you’ll thank past-you for any kind of “process.”
Make change enablement SLO-driven.
Replace “CAB on Thursdays” with a risk model in your pipeline. Express policy in code: if the impacted service’s error budget is healthy and the risk score is low, auto-approve. If the budget is tight or you’re in a reliability remediation window, block or require extra tests. This is change control, but it’s context-aware change control. DORA’s findings about external approvals vs. performance are your justification for this design.
Connect incident and problem management to postmortem practice.
Use blameless postmortems as your problem-management engine. Classify incidents by error-budget burn, not just severity code. Turn P0 action items into tracked changes, and tie them to SLOs so you can prove reliability investment is paying down risk — and justify the temporary feature pause when you’ve exceeded the budget. That’s the SRE workbook’s playbook, mapped to ITIL.
Treat your internal platform as the service value chain.
ITIL 4’s value system asks, “How do we create value end-to-end?” SRE and platform engineering answer with golden paths, sensible defaults, and a “paved road” that bakes controls into the developer experience: standardized telemetry, change risk scoring, identity, policy, and rollback. DORA 2024’s emphasis on platform engineering is a strong nudge that this is where elite teams win today.
Use the ITIL principles as guardrails, not gospels.
“Start where you are” means don’t bulldoze working practices in the name of purity. “Progress iteratively with feedback” means derive each improvement from SLO trends, not vibes. “Optimize and automate” is literally the SRE job description. ITIL 4’s principles legitimize SRE’s instincts for exec stakeholders who speak “framework.”
Modernize “service levels” with SLOs.
SLAs are commercial. SLOs are operational. Put SLOs in your service catalog; make them visible to incident managers, product owners, and auditors. When an exec asks, “Are we reliable enough to ship the holiday sale feature?” your answer shouldn’t be a feeling — it should be a budget balance.
I’ve seen “thin ITIL” approaches work best. Keep observable, code-enforced policy; ditch the ritual. Instead of weekly CABs, you have a Change Authority expressed in code that scores risks and approves standard changes. Instead of a dusty CMDB, you maintain an evolving service catalog as a byproduct of deployment (service registration) and runtime (discovery), tied to owners and SLOs. Instead of a separate problem team, you run blameless postmortems that feed directly into reliability backlogs. And instead of arguing frameworks, you use error budgets as the single throat to choke for prioritization.
The cultural win is big: product, platform, SRE, and service management stop fighting over who’s “right.” The budget tells you. If you’re burning it too fast, you pause features and invest in hardening. If you’re green, you can ship and learn. This is the “both/and” that IT leaders have been trying to explain since the first time someone asked whether ITIL and DevOps can coexist. They can — and ITIL 4 actually says the quiet part out loud: bring agile and DevOps in; keep the governance clarity.
Regulated environments still need auditable change control, evidence of approvals, and traceability. None of that requires committees if you have policy-as-code, identity, attestations, and immutability. Your “approval” can be a signed, risk-scored check in the pipeline that satisfies auditors and never wakes a human up. Pair that with SLO-based change freezes and you’ll often be more compliant and more reliable than a manual process that rubber-stamps everything at 4 p.m. on Fridays.
Two trends: first, platform engineering is now the operational substrate for many orgs. ITIL’s “value chain” maps neatly to platform product management: demand flows into backlog, into paved roads, into measurable outcomes. Second, AI is changing how we code, test, triage, and document — and it’s already in the DORA conversation. That doesn’t absolve you from guardrails; it makes them even more necessary, because AI accelerates everything, including mistakes. SRE’s SLO-first approach and ITIL’s focus on value and continual improvement give you a sane way to adopt AI without turning production into a science experiment.
Start by codifying a change risk model in your CI/CD. Look at blast radius, rollback readiness, prior failure rate, test coverage, and whether the service is in budget. High risk? Require extra controls. Low risk? Straight through. Now register your SLOs in the catalog, so incident managers and developers are arguing about facts, not adjectives. Next, declare an error-budget policy in writing: what freezes, when, and who can override. Make it boring and automatic — alerts, tickets, and gates should fire without meetings. Then, tie postmortems to change, so every P0 action shows up in a reliability lane with owners and dates. Finally, add a paved road lane where teams get self-service templates for standard changes, incident comms, and rollbacks. You’ve just implemented ITIL practices with SRE hands.
As an SRE, treat ITIL 4 like a measuring tool, not a doctrine. Use the principles to explain your design choices; use the practices to align people; use the SRE math to make it real and fast. The pager doesn’t care about acronyms. Customers don’t either. They care that the cart works, the transfer completes, the page loads, and the lights stay on. If a thin layer of ITIL helps your SRE practice make that boringly true, then yes — ITIL is still relevant. When in doubt, let the error budget decide.
PeopleCert — “ITIL 4 Foundation” — https://www.peoplecert.org/browse-certifications/it-governance-and-service-management/ITIL-1/itil-4-foundation-2565
Atlassian & AXELOS — “A practical guide to ITIL 4 in an age of agile” — https://www.atlassian.com/whitepapers/itil4
DORA — “Accelerate State of DevOps Report 2024” — https://dora.dev/research/2024/dora-report/
DORA — “2019 Accelerate State of DevOps Report” (PDF) — https://dora.dev/research/2019/dora-report/2019-dora-accelerate-state-of-devops-report.pdf
Google SRE Workbook — “Error Budget Policy for Service Reliability” — https://sre.google/workbook/error-budget-policy/
#SRE #SiteReliability #DEVOPS #ITIL #ITSM #ChangeManagement #ErrorBudgets #SLO #DORA #PlatformEngineering #IncidentManagement #Postmortems #ServiceCatalog #Cloud