“It used to be said there were two kinds of chairs to go with two kinds of Ministers: one sort that folds up instantly, the other sort goes round and round in circles.” — Bernard Woolley, Yes Minister [when the minister asks for a new chair]| mahesh’s blog
Since moving from academic research to industry in 2017, I’ve worked on two software projects. Each one started as a small, clean-slate1 skunkworks effort involving 2-3 people and gradually expanded to a large, conventional software engineering effort with dozens of engineers. The first of these (from 2017 to 2021) was Delos at Meta, a Chubby/ZooKeeper/etcd-like control plane storage system. The second was a new Kafka engine (from 2022 to 2024) that can run on any disaggregated storage laye...| mahesh’s blog
I posit that most software engineers (particularly those working on infrastructural systems) are destined to wallow in unnecessary complexity due to three fundamental laws.| mahesh’s blog
Early in my research career, I had a chance to work with some of the best system researchers1 in the world on a number of really interesting system designs. One of the enjoyable aspects of research was the particular process used by researchers (particularly in the SOSP/OSDI community) to come up with novel yet practical designs. This design process can be characterized as “fighting complexity with abstraction”: in any complex environment, how do you corral that complexity into cleanly de...| mahesh’s blog
There are at least five distinct paradigms for replication: Group Communication [0], Viewstamped Replication [1], MultiPaxos [2], Raft [3], and Shared Logs [4]. In a previous post, I did a deep-dive on MultiPaxos, showing that it implements a specific abstraction: State Machine Replication or SMR. The SMR API allows servers to propose commands and play them back in a durable total order:| mahesh’s blog
There are three questions to ask of any system: What abstraction does it implement? What is the design space for such an abstraction? Why is this abstraction useful? In a previous post, we examined the Paxos protocol and answered the first two questions. Paxos implements the abstraction of a Write-once Register (a WOR) using a combination of quorums and a two-phase locking protocol. As for the third question: Paxos is useful because it can be used to implement MultiPaxos (among other things)....| mahesh’s blog
The road to Paxos is a long one (as with other greek islands) and also somewhat elusive (it’s an island, after all). It took me longer than I’d like to admit to obtain a working understanding of the Paxos protocol. In my early attempts, I’d hit a brick wall of complexity: do I really need to know what this particular acceptor is going to do? What’s a learner anyway? What does it even mean to decide a value? Why do I need all these ballot numbers? In systems, we deal with complexity vi...| mahesh’s blog
In 2017, I went to Facebook on a sabbatical from my faculty position at Yale. I created a team to build a storage system called Delos at the bottom of the Facebook stack (think of it as Facebook’s version of Chubby). We hit production with a 3-person team in less than a year; and subsequently scaled the team to 30+ engineers spanning multiple sub-teams. In the four years that I led the team (until Spring 2021), we did not experience a single severe outage (nothing higher than a SEV3). The D...| mahesh’s blog
I’ve heard multiple times that a strong notion of leadership somehow simplifies replication. I don’t think this is true. I explain why in this post.| mahesh’s blog