Learn about site reliability engineering, and how SRE teams leverage monitoring and custom metrics to build a reliability stack that helps the business serve its users reliably.| 8th Light
Jonathan E. Magen's website!| yonkeltron.com
As usual when I discuss systems theory (e.g. information flow or material flow),| entropicthoughts.com
I share an approach to gradually increasing the reliability of a software development organization, without over or underinvesting in reliability| www.rubick.com
Turn SLOs into actionable alerts on significant events using Prometheus alerting. Improve precision, recall, detection time, and time for alerting.| sre.google
Explore the world of site reliability engineering with top-rated sre books. Find resources on SRE principles, best practices and the role of a reliability engineer| sre.google
Learn to use Service Level Objectives (SLOs) for continuous improvement in reliability and customer satisfaction, and discover the importance of SLOs.| sre.google