If you have worked at a company that moves fast (or claims to), you've inevitably had to deal with your pipelines breaking because the upstream team decided to change the data schema! If you are > Frequently in meetings, fixing pipeline issues due to schema changes > Stressed, unable to deliver quality work, always in a hurry to put out the next fire > Working with teams who have to prioritize speed over everything This post is for you. Constantly dealing with broken pipelines due to upstream...| www.startdataengineering.com
Learn how to configure and optimize incremental models when developing in dbt.| docs.getdbt.com
Data quality is such a broad topic. There are many ways to check the data quality of a dataset, but knowing what checks to run and when can be confusing and unclear. In this post, we will review the main types of data quality checks, where to use them, and what to do if a DQ check fails. By the end of this post, you will not only have a clear understanding of the different types of DQ checks and when to use them, but you'll also be equipped with the knowledge to prioritize which DQ checks to ...| www.startdataengineering.com
You can use a CODEOWNERS file to define individuals or teams that are responsible for code in a repository.| GitHub Docs
Do you need clarification about what Open Table Formats (OTF) are? Is it more than just a pointer to some metadata files that helps you sift through the data quickly? What is the difference between table formats (Apache Iceberg, Apache Hudi, Delta Lake) & file formats (Parquet, ORC)? How do OTFs work? Then this post is for you. Understanding the underlying principle behind open table formats will enable you to deeply understand what happens behind the scenes and make the right decisions when ...| www.startdataengineering.com
Worried about introducing data pipeline bugs, regressions, or introducing breaking changes? Then this post is for you. In this post, you will learn what CI is, why it is crucial to have data tests as part of CI, and how to create a CI pipeline that automatically runs data tests on pull requests using Github Actions.| www.startdataengineering.com