Imagine working for a company that processes a few GBs of data every day but spends hours configuring/debugging large-scale data processing systems! Whoever set up the data infrastructure copied it from some blog/talk by big tech. Now, the responsibility of managing the data team's expenses has fallen on your shoulders. You're under pressure to scrutinize every system expense, no matter how small, in an effort to save some money for the organization. It can be frustrating when data vendors ch...| www.startdataengineering.com
Hendrik Makait, Sarah Johnson, Matthew Rocklin 2024-05-14 14 min read We run benchmarks derived from the TPC-H benchmark suite on a variety of scales, hardware architectures, and dataframe projects...| Coiled
Efficient Data Processing in SQLA guide to understanding the core concepts of distributed data storage & processing, analytical functions, and query optimizations in your data warehouse.You want to be able to write efficient data processing pipelines in SQL, but you don't know where to start!There are too many topics to learn to get proficient at efficient data processing in SQL, like optimizing queries, partitioning, parallelism, data modeling, best practices, etc. It is overwhelming to have...| Gumroad
Big data is dead. Long live easy data. | Reading time: 19 min read| MotherDuck
Are you disappointed with online SQL tutorials that aren't deep enough? Are you frustrated knowing that you are missing SQL skills, but can't quite put your finger on it? This post is for you. In this post, we go over a few topics that can take your SQL skills to the next level and help you be a better data engineer.| www.startdataengineering.com
If you are trying to improve your data engineering skills or are the sole data person in your company, it can be hard to know how well your technical skills are developing. Questions like Am I building pipelines the right way? How do I measure up to DEs at bigger tech companies? How do I get feedback on my pipeline design? It can cause a lot of uncertainty in career development! Imagine if you know that your code is on par (or even better than) with pipelines at tech-forward companies and tha...| www.startdataengineering.com
Data pipelines built (and added on to) without a solid foundation will suffer from poor efficiency, slow development speed, long times to triage production issues, and hard testability. What if your data pipelines are elegant and enable you to deliver features quickly? An easy-to-maintain and extendable data pipeline significantly increase developer morale, stakeholder trust, and the business bottom line! Using the correct design pattern will increase feature delivery speed and developer valu...| www.startdataengineering.com