If your company has multiple dbt projects, you would have had to use code cross projects. Creating cross-project dependencies is not straightforward in a SQL templating system like dbt. If you are wondering: How to use seed data defined in one dbt project in another, How dbt packages work under the hood, Caveats to be aware of when using assets cross-projects, etc. This post is for you. In this post, we will go over how to use packaging in dbt to reuse assets and how packaging works under the...| www.startdataengineering.com
An in-significant data project portfolio can help set you apart from the run-of-a-mill candidate. Projects show that you are someone who can learn and adapt. Your portfolio informs a potential employer about your ability to continually learn, your knowledge of data pipeline best practices, and your genuine interest in the data field. Most importantly, it gives you the confidence to pick up new tools and build data pipelines from scratch. But setting up data infrastructure, with coding best pr...| www.startdataengineering.com
You know Python is essential for a data engineer. Does anyone know how much one should learn to become a data engineer? When you're in an interview with a hiring manager, how can you effectively demonstrate your Python proficiency? Imagine knowing exactly how to build resilient and stable data pipelines (using any language). Knowing the foundational ideas for data processing will ensure you can quickly adapt to the ever-changing tools landscape. In this post, we will review the concepts you n...| www.startdataengineering.com
Are you part of an under-resourced team where adding time-saving dbt (data build tool) features take a back seat to delivering new datasets? Do you want to incorporate time (& money) saving dbt processes but need more time? While focussing on delivery may help in the short term, the delivery speed will suffer without proper workflow! A good workflow will save time, prevent bad data, and ensure high development speed! Imagine the time (& mental pressure) savings if you didn't have to validate ...| www.startdataengineering.com
Struggling to come up with a data engineering project idea? Overwhelmed by all the setup necessary to start building a data engineering project? Don't know where to get data for your side project? Then this post is for you. We will go over the key components, and help you understand what you need to design and build your data projects. We will do this using a sample end-to-end data engineering project.| www.startdataengineering.com
Are you disappointed with online SQL tutorials that aren't deep enough? Are you frustrated knowing that you are missing SQL skills, but can't quite put your finger on it? This post is for you. In this post, we go over a few topics that can take your SQL skills to the next level and help you be a better data engineer.| www.startdataengineering.com
Trying to incorporate testing in a data pipeline? This post is for you. In this post, we go over 4 types of tests to add to your data pipeline to ensure high-quality data. We also go over how to prioritize adding these tests, while developing new features.| www.startdataengineering.com
Frustrated that hiring managers are not reading your Github projects? then this post is for you. In this post, we discuss a way to impress hiring managers by hosting a live dashboard with near real-time data. We will also go over coding best practices such as project structure, automated formatting, and testing to make your code professional. By the end of this post, you will have deployed a live dashboard that you can link to your resume and LinkedIn.| www.startdataengineering.com
Setting up an ELT data-ops workflow with multiple environments for developers is often extremely time consuming. What if there was a way to speed up this process, so that you could concentrate on modeling your data and delivering value to your end users? The good news is that there is a way. You can leverage dbt cloud to setup an ELT data-ops workflow in a very short time. In this post, we cover how to setup a data-ops workflow for an ELT system. We will go over how to setup dbt, snowflake, C...| www.startdataengineering.com
With the advent of powerful data warehouses like snowflake, bigquery, redshift spectrum, etc that allow separation of storage and execution, it has become very economical to store data in the data warehouse and then transform them as required. This post goes over how to design such a ELT system using stitch and DBT. The main objective is to keep the code complexity and server management low, while automating as much as possible| www.startdataengineering.com
In this article we aim to go over the reasoning behind why someone might want to use dbt. If you are interested in learning dbt checkout this article . Some common questions from Data Engineers about dbt are it is not very clear to me why would I use dbt instead of running SQL queries on Airflow| www.startdataengineering.com
Using dbt you can test the output of your sql transformations. If you have wondered how to "unit test" your sql transformations in dbt, then this post is for you. In this post, we go over how to write unit tests for your sql transformations with mock inputs/outputs and test them locally. This helps keep the development cycle shorter and enables you to follow a TDD approach for your sql based data pipelines.| www.startdataengineering.com