Data quality checks are critical for any production pipeline. While there are many ways to implement data quality checks, the greatexpectations library is one of the popular ones. If you have wondered 1. How can you effectively use the greatexpectations library? 2. Why is the greatexpectations library so complex? 3. Why is the greatexpectations library so clunky and has many moving pieces? Then this post is for you. In this post, we will go over the key concepts you’ll need to get up and ru...| www.startdataengineering.com
Data quality is such a broad topic. There are many ways to check the data quality of a dataset, but knowing what checks to run and when can be confusing and unclear. In this post, we will review the main types of data quality checks, where to use them, and what to do if a DQ check fails. By the end of this post, you will not only have a clear understanding of the different types of DQ checks and when to use them, but you'll also be equipped with the knowledge to prioritize which DQ checks to ...| www.startdataengineering.com
Trying to incorporate testing in a data pipeline? This post is for you. In this post, we go over 4 types of tests to add to your data pipeline to ensure high-quality data. We also go over how to prioritize adding these tests, while developing new features.| www.startdataengineering.com
If you are trying to improve your data engineering skills or are the sole data person in your company, it can be hard to know how well your technical skills are developing. Questions like Am I building pipelines the right way? How do I measure up to DEs at bigger tech companies? How do I get feedback on my pipeline design? It can cause a lot of uncertainty in career development! Imagine if you know that your code is on par (or even better than) with pipelines at tech-forward companies and tha...| www.startdataengineering.com