Let’s start by backfilling some links… tracking down a single-bit memory corruption| Tumblr
With Google’s recent announcement of support for running real Linux apps on Chrome OS, I picked up a Pixelbook, since I’ve been long awaiting the viability of Chromebooks as development machines. After setting up a dev VM and experimenting with various projects, I found that one Tensorflow application I was playing with would lock up, hard, inside the Crostini VM on my Chromebook. After adding some debug prints, I discovered that virtually any calls into numpy.linalg.inv were hanging. I c...| nelhage debugs shit
This story is simplified from an attempt to write an AST walker for a toy compiler, but the essential facts are unchanged. I am fairly new to Rust (but an experienced C++ programmer) and have been trying to pick up Rust recently. This evening I spent a miserable hour trying to write a function to map an FnMut over a binary tree. I started with a simple tree definition, and try to write what seems to me to be the straightforward code: use std::rc::Rc; #[derive(Debug)] enum Tree { Leaf(i32), No...| nelhage debugs shit
Recently, on my other blog accidentallyquadratic, I documented a case of accidentally quadratic behavior in /proc/$pid/maps on a wide range of recent Linux kernels. While this bug is amusing, it might initially not seem that important; /proc/$pid/maps is primarily a debugging or inspection tool, and while 30s access times aren’t pleasant, they probably aren’t breaking anything too critical. Today I want to explore, by way of some microbenchmarks, the more pernicious impact of that bug. I ...| nelhage debugs shit
This one is a little boring in that it’s not a new bug, but tracking it down was still real exciting. A while back, Stripe started experiencing some serious intermittent sadness with our internal DNS servers. DNS queries would time out or fail to return, and our DNS servers would periodically OOM kill, despite no application appearing to use much memory. This incident was during the era of our consul battles, so we suspected consul, but were unable to prove its complicity in any way. Finall...| nelhage debugs shit
We use Vagrant for development at Stripe, using NFS mounts to share code from the host into the Vagrant dev box. We use bundler to manage Ruby dependencies, and configure it to install gems directly into the project directory, inside the vendor/ subdirectory. Installing gems there, instead of globally, ensures isolation, and preserves gems across re-creation of the Vagrant VM, which means you don’t need to wait for a bunch of gems to download if you blow away your VM. Recently we upgraded o...| nelhage debugs shit
I was drinking with some coworkers and mentioned I’d never written a brainfuck quine. So, of course, as soon as we got back to computers, they start timing me. It took me just over 30 minutes to produce this incredibly-verbose quine. I figured I’d do a quick writeup of how it works. If you’re not familiar with Brainfuck, you can get a quick refresher here. It’s incredibly simple, so I won’t go into detail here. Most simple quines can be divided into two parts, which I’ll refer to ...| nelhage debugs shit
I was at Gophercon last week, and the last day, Saturday, was a hack day where people all sat down and just worked on projects, largely in Go. I decided on a whim to play with doing runtime code generation in Go. I’ve done some toy JIT work before in C and C++, so I’m pretty familiar with the space, and it seemed like something fun I hadn’t heard anyone playing with in Go. After a few days of hacking, I produced some working code, complete with a PoC brainfuck JIT, entirely written in G...| nelhage debugs shit
Recently, I’d noticed a bunch of cases where MongoDB would be far, far slower to build indexes on secondaries than on the primary. An index build would finish in a few hours on a primary, but then take a day or more to build once the indexing operation replicated to a secondary. Eventually I got annoyed enough to decide to debug. I threw perf and PMP at a build that was running on a secondary, and they mostly just informed me that the build was spending most of its time comparing BSON objec...| nelhage debugs shit
When Stripe ran our CTF 3.0, I wrote most of level 3, which was a full-text search challenge inspired in part by my own livegrep. I wrote a naïve implementation, which just looped over the files, read them into memory, and used java.lang.String.contains to check if each file contained the “needle”, and we released that implementation as the baseline implementation that contestants needed to improve on. I also wrote a solution that used a simple trigram index, which was the solution you h...| nelhage debugs shit
So we run a bunch of EventMachine at Stripe. I personally hate EventMachine, but it’s what we’ve got, and it’s probably the best answer if you really want async I/O in ruby. One question you inevitably find yourself asking the question: How close is my EventMachine worker process to capacity? How many more requests/second can this worker handle? This is, frustratingly, not a super straightforward question. Because of the asynchronous nature of EM, you might have multiple requests logic...| nelhage debugs shit
So this morning, a friend was bitching about some Python code he’d inherited and was trying to debug. The author of the code, in a fit of insanityencapsulation, had written code using a bunch of nested closures, like so: def f(): def g(): return "hello this is g" # do something with g() He wanted to poke at this code in a REPL, and in particular, was hoping to call g(), but couldn’t because it wasn’t accessible outside of the function. I made an offhand remark about poking inside the fu...| nelhage debugs shit
Tracking down a memory leak in Ruby's EventMachine - Made of Bugs: Ruby, EventMachine, gdb, openssl, oh my!| nelhage debugs shit
A Very Subtle Bug - Made of Bugs: Trials and tribulations with Python, subprocess, and unix signals| nelhage debugs shit