If you’ve ever trained a modern-day machine learning model, there’s a good chance you’ve had to clean a lot of training data. Maybe that means creating captions for millions/billions of images, maybe that means going through a ton of files and filtering out low-quality ones, it could be anything– but one constant is that these scripts must handle a lot of input/output perfectly, which is a tall order considering how many ways things could go wrong. If you’ve ever been anxious about ...