Intricacies of on-call rotations at Google, including strategies for optimizing pager load, psychological safety, and fostering effective teams.| sre.google
How Evernote and Home Depot adpted SLOs to enhance reliability. Learn from their experiences with SLos and error budgets for improved service quality.| sre.google
Making users more secure generally means annoying them. Whether it’s making them carry a hardware security key or just enforcing a short screensaver timeout, changing how people go about their work is annoying—and an annoyed user is not a secure user. The effectiveness of a lot of security controls relies on the user cooperating. If they get frustrated with all the barriers and friction between them and doing their actual job, they might just find ways around the controls—their shortene...| bradleyjkemp.dev
I share an approach to gradually increasing the reliability of a software development organization, without over or underinvesting in reliability| www.rubick.com
This is a complete guide to Kubernetes API Server SLO Alerts. In this new guide, you’ll learn: Kubernetes official Service Level Objectives (SLOs). What are Error Budgets? How to turn Error Budgets into alerts? What are Multiwindow, Multi-Burn-Rate alerts? What is this KubeAPIErrorBudget alert? Mixin’s Kubernetes API Server SLO alerts. Lots more. Kubernetes Service Level […]| Povilas Versockas