A few jobs ago, I worked at company that collected data from disparate sources, then processed and deduplicated it into spreadsheets for ingestion by the data science and customer support teams. Some common questions the engineering team got were: Why is the data in some input CSV missing in the output? Why is data in the output CSV not matching what we expect? To debug these problems, the process was to try to reverse engineer where the data came from, then try to guess which path that data ...