Engineering toil is slowing down cloud teams and wasting valuable cycles. Learn how ControlMonkey eliminates Terraform toil with automation.| ControlMonkey
The Problem # Two of our the communities we serve ( NMFS Openscapes and CryoCloud) reported issues with starting GPU nodes on their hubs. Upon investigation, I discovered that the cluster autoscaler seems to not recognize that GPUs were available in the cluster at all suddenly, and hence wasn’t provisioning the nodes.| 2i2c
My blog about interesting technology - and in particular Cloud Platforms & Services, and my experiences with them| alexos.dev
Getting a lot done is important only when we’re doing the right things| jordankaye.dev
Every ops team has some manual procedures that they haven’t gotten around to automating yet. Toil can never be totally eliminated. Very often, the biggest toil center for a team at a growing …| Dan Slimmon
SREs optimize their time by eliminating toil, the repetitive, predictable tasks related. The characteristics of toil and operational efficiency.| sre.google
Blameless postmortems in SRE culture. Incident study that focus on root cause analysis and preventive actions, for culture of continuous improvement.| sre.google