As a data engineer, you would have spent hours trying to figure out the right place to make a change in your repository—I know I have. > You think, "Why is it so difficult to make a simple change?". > You push a simple change (with tests, by the way), and suddenly, production issues start popping up! > Dealing with on-call issues when your repository is spaghetti code with multiple layers of abstracted logic is a special hell that makes data engineers age in dog years! > Messy code leads to...| www.startdataengineering.com
System design interviews are usually vague and depend on you (as the interviewee) to guide the interviewer. If you are thinking: How do I prepare for data engineering system design interviews? I struggle to think of questions you would ask in a system design interview for data engineering; I don't have enough interview experience to know what companies ask. Is data engineering "system design" more than choosing between technologies like Spark and Airflow? This post is for you! Imagine being a...| www.startdataengineering.com
Imagine working for a company that processes a few GBs of data every day but spends hours configuring/debugging large-scale data processing systems! Whoever set up the data infrastructure copied it from some blog/talk by big tech. Now, the responsibility of managing the data team's expenses has fallen on your shoulders. You're under pressure to scrutinize every system expense, no matter how small, in an effort to save some money for the organization. It can be frustrating when data vendors ch...| www.startdataengineering.com
Imagine this scenario: You are on call when suddenly an obscure alert pops up. It just says that your pipeline failed but has no other information. The pipelines you inherited (or didn't build) seem like impenetrable black boxes. When they break, it's a mystery—why did it happen? Where did it go wrong? The feeling is palpable: frustration and anxiety mount as you scramble to resolve the issue swiftly. It's a common struggle, especially for new team members who have yet to unravel the system...| www.startdataengineering.com
Stream processing differs from batch; one needs to be mindful of the system's memory, event order, and system recovery in case of failures. However, understanding the fundamental concepts of time attributes, cluster memory, time-bounded joins, and system monitoring will enable you to build resilient and efficient streaming pipelines. If you are looking for an end-to-end streaming tutorial or a project to understand the foundational skills required to build streaming pipelines, this post is fo...| www.startdataengineering.com
As data engineers, you might have heard the terms functional data pipeline, factory pattern, singleton pattern, etc. One can quickly look up the implementation, but it can be tricky to understand what they are precisely and when to (& when not to) use them. Blindly following a pattern can help in some cases, but not knowing the caveats of a design will lead to hard-to-maintain and brittle code! While writing clean and easy-to-read code takes years of experience, you can accelerate that by und...| www.startdataengineering.com