Modern CPUs are actually pretty good at predicting the indirect branch inside an interpreter loop, _contra_ the conventional wisdom. We take a deep dive into the ITTAGE indirect branch prediction algorithm, which is capable of making those predictions, and draw some connections to some other interests of mine in the areas of fuzzing and reinforcement learning.| Made of Bugs
A deep dive into the performance of Python 3.14's tail-call interpreter: How the performance results were confounded by an LLVM regression, the surprising complexity of compiling interpreter loops, and some reflections on performance work, software engineering, and optimizing compilers.| Made of Bugs
How do you find near-duplicates in a massive collection of documents? An exploration of the Jaccard similarity metric, and the MinHash hashing trick used to efficiently approximate it at web scale.| Made of Bugs
A meditation on expertise: Exploring the uses and weaknesses of profilers and profiling, through the lens of Gary Klein's "Seeing the Invisible"| Made of Bugs
Ever since its introduction in the 2017 paper, Attention is All You Need, the Transformer model architecture has taken the deep-learning world by storm. Initially introduced for machine translation, it has become the tool of choice for a wide range of domains, including text, audio, video, and others. Transformers have also driven most of the massive increases in model scale and capability in the last few years. OpenAI’s GPT-3 and Codex models are Transformers, as are DeepMind’s Gopher mo...| Made of Bugs
Introduction This post attempts to describe a mindset I’ve come to realize I bring to essentially all of my work with software. I attempt to articulate this mindset, some of its implications and strengths, and some of the ways in which it’s lead me astray. Software can be understood I approach software with a deep-seated belief that computers and software systems can be understood. This belief is, for me, not some abstruse theoretical assertion, but a deeply felt belief that essentially a...| Made of Bugs
I have spent many years as an software engineer who was a total outsider to machine-learning, but with some curiosity and occasional peripheral interactions with it. During this time, a recurring theme for me was horror (and, to be honest, disdain) every time I encountered the widespread usage of Python pickle in the Python ML ecosystem. In addition to their major security issues1, the use of pickle for serialization tends to be very brittle, leading to all kinds of nightmares as you evolve y...| Made of Bugs
tl;dr “Transparent Hugepages” is a Linux kernel feature intended to improve performance by making more efficient use of your processor’s memory-mapping hardware. It is enabled ("enabled=always") by default in most Linux distributions. Transparent Hugepages gives some applications a small performance improvement (~ 10% at best, 0-3% more typically), but can cause significant performance problems, or even apparent memory leaks at worst. To avoid these problems, you should set enabled=madv...| Made of Bugs
With the ongoing move towards “infrastructure-as-code” and similar notions, there’s been an ongoing increase in the number and popularity of declarative configuration management tools. This post attempts to lay out my mental model of the conceptual architecture and internal layering of such tools, and some wishes I have for how they might work differently, based on this model. Background: declarative configuration management Declarative configuration management refers to the class of to...| Made of Bugs
While talking about thinking about tests and testing in software engineering recently, I’ve come to the conclusion that there are (at least) two major ideas and goals that people have when they test or talk about testing. This post aims to outline what I see as these two schools, and explore some reasons engineers coming from these different perspectives can risk talking past each other. Two reasons to test Testing for correctness The first school of testing comprises those who see testing ...| Made of Bugs
This is the second in an indefinite series of posts about things that I think went well in the Sorbet project. The previous one covered our testing approach. Sorbet is fast. Numerous of our early users commented specifically on how fast it was, and how much they appreciated this speed. Our informal benchmarks on Stripe’s codebase clocked it as typechecking around 100,000 lines of code per second per core, making it one of the fastest production typecheckers we are aware of.| Made of Bugs
At this point in my career, I’ve worked on at least three projects where performance was a defining characteristic: Livegrep, Taktician, and Sorbet (I discussed sorbet in particular last time, and livegrep in an earlier post). I’ve also done a lot of other performance work on the tools I use, some of which ended up on my other blog, Accidentally Quadratic. In this post, I want to reflect on some of the lessons I’ve learned while writing performant software, and working with rather a lot...| Made of Bugs
Last week, I wrote about the mindset that computer systems can be understood, and behaviors can be explained, if we’re willing to dig deep enough into the stack of abstractions our software is built atop. Some of the ensuing discussion on Twitter and elsewhere lead me to write this followup, in which I want to run through a few classes of systems where I’ve found pursuing in-detail understanding of the system wasn’t the right answer.| Made of Bugs
People who work with me tend to realize that I have Opinions about databases, and SQL databases in particular. Last week, I wrote about a Postgres debugging story and tweeted about AWS’ policy ban on internal use of SQL databases, and had occasion to discuss and debate some of those feelings on Twitter; this article is an attempt to write up more of them into a single place I can refer to.| Made of Bugs