How Google Site Reliability Engineering teams structure on-call rotations, site operations, structure approaches to handle incidents and production issues.| sre.google
Practices worth adopting for other organizations| addyo.substack.com
or Steve or Sally or Rebecca or that stupid mascot dog you have on our website| badsoftwareadvice.substack.com
Sometimes, a seemingly simple and obvious solution can lead to a series of problems later on. This is especially true when adding retries.| Medium
The concept of blameless culture has been around for a long time in other industries, and while the history isn’t clear, you could argue that it became an “official” part of the tech industry with the publication of the definitive book Site Reliability Engineering in 2016. My summary of blameless culture is: when there is […]| cat /dev/brain
Chromium >| www.chromium.org
Despite advances in browser tooling, automated evaluation, lab tools, guidance, and runtimes, modern teams struggle to deliver even decent performance with today's popular frameworks. This is not a technical problem per se. It's a management issue, and one that teams can conquer with the right frame of mind and support.| Infrequently Noted