Are you a data engineer(or new to data space) wondering why one may need to use Apache Airflow vs. just using cron? Does Apache Airflow feel like an over-optimized solution for a simple problem? Then this post is for you. Understanding the critical features necessary for a data pipelining system will ensure that your output is high quality! Imagine knowing exactly what a complex orchestration system brings to the table; you can make the right tradeoffs for your data architecture. This post wi...| www.startdataengineering.com
An in-significant data project portfolio can help set you apart from the run-of-a-mill candidate. Projects show that you are someone who can learn and adapt. Your portfolio informs a potential employer about your ability to continually learn, your knowledge of data pipeline best practices, and your genuine interest in the data field. Most importantly, it gives you the confidence to pick up new tools and build data pipelines from scratch. But setting up data infrastructure, with coding best pr...| www.startdataengineering.com
You know Python is essential for a data engineer. Does anyone know how much one should learn to become a data engineer? When you're in an interview with a hiring manager, how can you effectively demonstrate your Python proficiency? Imagine knowing exactly how to build resilient and stable data pipelines (using any language). Knowing the foundational ideas for data processing will ensure you can quickly adapt to the ever-changing tools landscape. In this post, we will review the concepts you n...| www.startdataengineering.com
Setting up data infra is one of the most complex parts of starting a data engineering project. Overwhelmed trying to set up data infrastructure with code? Or using dev ops practices such as CI/CD for data pipelines? In that case, this post will help! This post will cover the critical concepts of setting up data infrastructure, development workflow, and sample data projects that follow this pattern. We will also use a data project template that runs Airflow, Postgres, & Metabase to demonstrate...| www.startdataengineering.com
Struggling to come up with a data engineering project idea? Overwhelmed by all the setup necessary to start building a data engineering project? Don't know where to get data for your side project? Then this post is for you. We will go over the key components, and help you understand what you need to design and build your data projects. We will do this using a sample end-to-end data engineering project.| www.startdataengineering.com
Trying to incorporate testing in a data pipeline? This post is for you. In this post, we go over 4 types of tests to add to your data pipeline to ensure high-quality data. We also go over how to prioritize adding these tests, while developing new features.| www.startdataengineering.com
Frustrated that hiring managers are not reading your Github projects? then this post is for you. In this post, we discuss a way to impress hiring managers by hosting a live dashboard with near real-time data. We will also go over coding best practices such as project structure, automated formatting, and testing to make your code professional. By the end of this post, you will have deployed a live dashboard that you can link to your resume and LinkedIn.| www.startdataengineering.com
Wondering how to backfill an hourly SQL query in Apache Airflow ? Then, this post is for you. In this post we go over how to manipulate the execution_date to run backfills with any time granularity. We use an hourly DAG to explain execution_date and how you can manipulate them using Airflow macros.| www.startdataengineering.com
Frustrated with handling data type conversion issues in python? Then this post is for you. In this post, we go over a reusable data type conversion pattern using Pydantic. We will also go over the caveats involved in using this library.| www.startdataengineering.com
Unable to find practical examples of idempotent data pipelines? Then, this post is for you. In this post, we go over a technique that you can use to make your data pipelines professional and data reprocessing a breeze.| www.startdataengineering.com