You’ve undoubtedly heard of the psychological concept called flow state. This is the feeling you get when you’re in the zone, where you’re doing some sort of task, and you’re just really into it, and you’re focused, and it’s challenging but not frustratingly so. It’s a great feeling. You might experience this with a work … Continue reading My favorite developer productivity research method that nobody uses→| Surfing Complexity
One of the early criticisms of Darwin’s theory of evolution by natural selection was about how it could account for the development of complex biological structures. It’s often not obvious to us how the earlier forms of some biological organ would have increase fitness. “What use”, asked the 19th century English biologist St. George Jackson … Continue reading Easy will always trump simple→| Surfing Complexity
There are software technologies that work really well in-the-small, but they don’t scale up well. The challenge here is that the problem size grows incrementally, and migrating off of them re…| Surfing Complexity
3 posts published by Lorin Hochstein during August 2025| Surfing Complexity
Accountability is a mechanism that achieves better outcomes by aligning incentives, in particular, negative ones. Specifically: if you do a bad thing, or fail to do a good thing, under your sphere …| Surfing Complexity
Amazon’s recent announcement of their spec-driven AI tool, Kiro, inspired me to write a blog post on a completely unrelated topic: formal specifications. In particular, I wanted to write abou…| Surfing Complexity
Here are a few anecdotes about safety from the past few years. In 2020, the world was struck by the COVID-19 pandemic. The U.S. response was… not great. Earlier in 2019, before the pandemic s…| Surfing Complexity
8 posts published by Lorin Hochstein during May 2025| Surfing Complexity
(With apologies to The Smashing Pumpkins) A few weeks ago, Cloudflare experienced a major outage of their popular 1.1.1.1 public DNS resolver. Technically, the DNS resolver itself was working just fine: it was (as far as I’m aware) up and running the whole time. The problem was that nobody on the Internet could actually reach … Continue reading Cloudflare and the infinite sadness of migrations→| Surfing Complexity
Technopoly by Neil Postman, published in 1993 “Can language models be too big? asked the researchers Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell in their famous…| Surfing Complexity
1 post published by Lorin Hochstein during July 2025| Surfing Complexity
Let’s play another round where contrast the root-cause-analysis (RCA) perspective to the resilience engineering (RE) perspective. Today’s edition is about the distribution of potentiall…| Surfing Complexity
7 posts published by Lorin Hochstein during August 2019| Surfing Complexity
2 posts published by Lorin Hochstein during March 2024| Surfing Complexity
When writing up my impressions of the GCP incident report, Cindy Sridharan’s tweet reminded me that I failed to comment on an important part of it, how the responders brought the overloaded s…| Surfing Complexity
On Thursday (2025-06-12), Google Cloud Platform (GCP) had an incident that impacted dozens of their services, in all of their regions. They’ve already released an incident report (go read it!…| Surfing Complexity
3 posts published by Lorin Hochstein during March 2025| Surfing Complexity
2 posts published by Lorin Hochstein during June 2025| Surfing Complexity
A year ago, Mihail Eric wrote a blog post detailing his experiences working on AI inside Amazon: How Alexa Dropped the Ball on Being the Top Conversational System on the Planet. It’s a great …| Surfing Complexity
Simplicity is prerequisite for reliability. — Edsger W. Dijkstra Think about a system whose reliability had significantly improved over some period of time. The first example that comes to my mind …| Surfing Complexity
The late science fiction author Arthur C. Clarke had a great line: Any sufficiently advanced technology is indistinguishable from magic. (This line inspired the related observation: any sufficientl…| Surfing Complexity
One of the most famous physics experiments in modern history is the double-split experiment, originally performed by the English physicist Thomas Young back in 1801. You probably learned about this…| Surfing Complexity
I don’t know anything about your organization, dear reader, but I’m willing to bet that the amount of time and attention your organization spends on post-incident work is a function of …| Surfing Complexity
One of the criticisms leveled at resilience engineering is that the insights that the field generates aren’t actionable: “OK, let’s say you’re right, that complex systems ar…| Surfing Complexity
Safety researchers love using metaphors as a framework to describe how accidents happen, which they call accident models. One of the earliest models, dating back to 1931, is Herbert W. HeinrichR…| Surfing Complexity
If you’re a regular reader of this blog, you’ll have noticed that I tend to write about two topics in particular: Resilience engineering Formal methods I haven’t found many people…| Surfing Complexity
Laura Nolan of Slack recently published an excellent write-up of their Jan. 4, 2021 outage on Slack’s engineering blog. One of the things that struck me about this writeup is the contributing facto…| Surfing Complexity
(Some of my co-workers call this Lorin’s Law) Even highly reliable systems go down occasionally. After having read over the details of several incidents, I’ve started to notice a patter…| Surfing Complexity
FizzBee is a new formal specification language, originally announced back in May of last year. FizzBee’s author, Jayaprabhakar (JP) Kadarkarai, reached out to me recently and asked me what I …| Surfing Complexity
FAA data shows 30 near-misses at Reagan Airport – NPR, Jan 30, 2025 The amount of attention an incident gets is proportional to the severity of the incident: the greater the impact to the organizat…| Surfing Complexity
The sorry state of dashboards It’s true: the dashboards we use today for doing operational diagnostic work are … let’s say suboptimal. Charity Majors is one of the founders of Hon…| Surfing Complexity
Cloudflare consistently generates the highest quality public incident writeups of any tech company. Their latest is no exception: Cloudflare incident on November 14, 2024, resulting in lost logs. I…| Surfing Complexity
A play in one act Dramatis personae EM, an engineering manager TL, the tech lead for the team X, an engineering manager from a different team Scene 1: A meeting room in an office. The walls are ado…| Surfing Complexity
Justine Tunney recently wrote a blog post titled The Fastest Mutexes where she describes how she implemented mutexes in Cosmopolitan Libc. The post discusses how her implementation uses futexes by …| Surfing Complexity
Back in August, Murat Derimbas published a blog post about the paper by Herlihy and Wing that first introduced the concept of linearizability. When we move from sequential programs to concurrent on…| Surfing Complexity
Here’s a brief excerpt from a talk by David Woods on what he calls the component substitution fallacy (emphasis mine): Everybody is continuing to commit the component substitution fallacy. No…| Surfing Complexity
One of the workhorses of the modern software world is the key-value store. there are key-value services such as Redis or Dynamo, and some languages build key-value data structures right in to the l…| Surfing Complexity
I’ve been reading Alex Petrov’s Database Internals to learn more about how databases are implemented. One of the topics covered in the book is a data structure known as the B-tree. Rela…| Surfing Complexity
For databases that support transactions, there are different types of anomalies that can potentially occur: the higher the isolation level, the more classes of anomalies are eliminated (at a cost o…| Surfing Complexity
Cliff L. Biffle blogged a great write-up of a debugging odyssey at Oxide with the title Who killed the network switch? Here’s the bit that jumped out at me: At the time that code was written…| Surfing Complexity
Photo by Matthew Lancaster We know that not all of the services in our system are critical. For example, some of our internal services provide support functions (e.g., observability, analytics), wh…| Surfing Complexity