(This is a repost of a document living here, but I am putting it here for backup's sake. Originally a joint effort with Murali Suriar, with input from Matt Brown, Liz Fong-Jones, and many others. The intended audience of this doc is the recently laid-off, or those who| RelyAbility Blog
[A repost for reference, since the original was removed as part of house-cleaning elsewhere] I'm solidly in favour of a planning architecture of some kind for any team-size collection of people greater than about 5. (Hell, arguably above 2, but let’s keep overheads down.) I’| RelyAbility Blog
[Note to the reader: chairs have an obvious obligation to remain as neutral as possible, which I take very seriously. If I mention specific speakers or talks below, it is definitely not to the exclusion of others.] Though I’ve been involved with SREcon EMEA many times before, this| RelyAbility Blog
“The waste remains, The waste remains and kills” – Missing Dates, William Empson Economics teaches us that one person’s income is another person’s expenditure. But we don’t think like that when we think of waste. Instead, our usual intuition for waste is| RelyAbility Blog
Today, I believe we cannot successfully answer several key questions about SRE. Let's start with the most important one: how can we understand what reliability customers want and need?| RelyAbility Blog
Somewhere between 15 and 20 years ago, I worked for a company. It was a very prestigious company, and it was a glorious and frustrating time. The company did amazing things. Literally unbelievable achievements - from my point of view anyway. But this was coupled with levels of chaos that| RelyAbility Blog
I've been reflecting recently on the journey from infra/SRE to product/service owner, and my lessons - expected and not - along the way. My initial anchor on product thinking was based on an effort Google had, originally called P2020, to do product development within SRE tailored to the| RelyAbility Blog
[Cross-posted to https://www.stanza.systems/post/srecon-americas-2025] Those of us who do booth duty at a conference don’t often get a chance to attend many, if any, sessions. So my view on SRECon is necessarily limited by the fact that I spent more time talking to prospects, customers,| RelyAbility Blog
The AI wave is passing over us: what of genuine value will be left behind? asks Niall Murphy As a long-time observer of the SRE/DevOps tooling market, I look at the tsunami of AI-powered and LLM-enabled currently engulfing our industry like most great wave observers would: half in genuine| RelyAbility Blog
This is a topic of intermediate complexity in SLOs. If you are coming to this cold, we recommend you read a few other pieces about SLOs first, then this will make a fair bit more sense to you. SLOs, as you may know, have a dual nature: they have both| RelyAbility Blog
Recently we at Stanza have been exploring operational data, and it's been really exciting to bring techniques and ideas from other domains into our domain - production systems generally, traffic, alerting, cloud costs, etc. The thing we’ve been looking at most recently is a thing called Benford’s Law.| RelyAbility Blog
Software cannot be shown to be stable, and so it’s safer to assume it isn’t.| RelyAbility Blog
What is graceful degradation? Graceful degradation is the idea that, when you can’t serve the user precisely what they wanted, instead of serving the user an error, you serve them some in-between thing. The details of this depend a lot on what exactly it is you’re trying to| RelyAbility Blog
Comments/Insights/Contributions from * Niall Murphy * Toby Burress * Štěpán Davidovič * Sal Furino (Note that when I say "we" below, I don't specifically intend to speak for these fine people, I'm just using the academic "we". -Niall) Introduction If you don’t already know about SLOs, we can recommend Alex Hidalgo’| RelyAbility Blog
[Reposted from Medium company blog] Introduction I feel like a little bit of a fraud writing about this, since I only managed to attend KubeCon virtually. But I watched enough of it and read enough about it that it gave me some thoughts. OpenTelemetry (Otel) Those of us who only| RelyAbility Blog