Over the past decade, every department has wanted to be data-driven, and data engineering teams are under more pressure than ever. If you have been an engineer for over a few years, you would have seen your world change from a 'well-planned data model' to a 'dump everything in S3 and get some data for the end-user'. Data engineers are under a lot of stress caused by : > The Business is becoming too complex, and every department wants to become data-driven; thus, expectations from the data tea...| www.startdataengineering.com
1. Introduction 2. Code & Data 3. Using nested data types effectively 3.1. Use STRUCT for one-to-one & hierarchical relationships 3.2. Use ARRAY[STRUCT] for one-to-many relationships 3.3. Using nested data types in data processing 3.3.1. STRUCT enables more straightforward data schema and data access 3.3.2. Nested data types can be sorted 3.3.3. UNNEST ARRAY to rows and GROUP rows to ARRAY 3.| Posts on Start Data Engineering
Evolution🔗| iceberg.apache.org
Data quality is such a broad topic. There are many ways to check the data quality of a dataset, but knowing what checks to run and when can be confusing and unclear. In this post, we will review the main types of data quality checks, where to use them, and what to do if a DQ check fails. By the end of this post, you will not only have a clear understanding of the different types of DQ checks and when to use them, but you'll also be equipped with the knowledge to prioritize which DQ checks to ...| www.startdataengineering.com
Do you need clarification about what Open Table Formats (OTF) are? Is it more than just a pointer to some metadata files that helps you sift through the data quickly? What is the difference between table formats (Apache Iceberg, Apache Hudi, Delta Lake) & file formats (Parquet, ORC)? How do OTFs work? Then this post is for you. Understanding the underlying principle behind open table formats will enable you to deeply understand what happens behind the scenes and make the right decisions when ...| www.startdataengineering.com
If you are trying to improve your data engineering skills or are the sole data person in your company, it can be hard to know how well your technical skills are developing. Questions like Am I building pipelines the right way? How do I measure up to DEs at bigger tech companies? How do I get feedback on my pipeline design? It can cause a lot of uncertainty in career development! Imagine if you know that your code is on par (or even better than) with pipelines at tech-forward companies and tha...| www.startdataengineering.com