Over the past decade, every department has wanted to be data-driven, and data engineering teams are under more pressure than ever. If you have been an engineer for over a few years, you would have seen your world change from a 'well-planned data model' to a 'dump everything in S3 and get some data for the end-user'. Data engineers are under a lot of stress caused by : > The Business is becoming too complex, and every department wants to become data-driven; thus, expectations from the data tea...| www.startdataengineering.com
If you have worked at a company that moves fast (or claims to), you've inevitably had to deal with your pipelines breaking because the upstream team decided to change the data schema! If you are > Frequently in meetings, fixing pipeline issues due to schema changes > Stressed, unable to deliver quality work, always in a hurry to put out the next fire > Working with teams who have to prioritize speed over everything This post is for you. Constantly dealing with broken pipelines due to upstream...| www.startdataengineering.com
System design interviews are usually vague and depend on you (as the interviewee) to guide the interviewer. If you are thinking: How do I prepare for data engineering system design interviews? I struggle to think of questions you would ask in a system design interview for data engineering; I don't have enough interview experience to know what companies ask. Is data engineering "system design" more than choosing between technologies like Spark and Airflow? This post is for you! Imagine being a...| www.startdataengineering.com
Data quality checks are critical for any production pipeline. While there are many ways to implement data quality checks, the greatexpectations library is one of the popular ones. If you have wondered 1. How can you effectively use the greatexpectations library? 2. Why is the greatexpectations library so complex? 3. Why is the greatexpectations library so clunky and has many moving pieces? Then this post is for you. In this post, we will go over the key concepts you’ll need to get up and ru...| www.startdataengineering.com
Data quality is such a broad topic. There are many ways to check the data quality of a dataset, but knowing what checks to run and when can be confusing and unclear. In this post, we will review the main types of data quality checks, where to use them, and what to do if a DQ check fails. By the end of this post, you will not only have a clear understanding of the different types of DQ checks and when to use them, but you'll also be equipped with the knowledge to prioritize which DQ checks to ...| www.startdataengineering.com