You know Python is essential for a data engineer. Does anyone know how much one should learn to become a data engineer? When you're in an interview with a hiring manager, how can you effectively demonstrate your Python proficiency? Imagine knowing exactly how to build resilient and stable data pipelines (using any language). Knowing the foundational ideas for data processing will ensure you can quickly adapt to the ever-changing tools landscape. In this post, we will review the concepts you n...| www.startdataengineering.com
Source code: Lib/csv.py The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to att...| Python documentation
Source code: Lib/json/__init__.py JSON (JavaScript Object Notation), specified by RFC 7159(which obsoletes RFC 4627) and by ECMA-404, is a lightweight data interchange format inspired by JavaScript...| Python documentation
Source code: Lib/copy.py Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy ...| Python documentation
As data engineers, you might have heard the terms functional data pipeline, factory pattern, singleton pattern, etc. One can quickly look up the implementation, but it can be tricky to understand what they are precisely and when to (& when not to) use them. Blindly following a pattern can help in some cases, but not knowing the caveats of a design will lead to hard-to-maintain and brittle code! While writing clean and easy-to-read code takes years of experience, you can accelerate that by und...| www.startdataengineering.com
Data engineering project for beginners, using AWS Redshift, Apache Spark in AWS EMR, Postgres and orchestrated by Apache Airflow.| www.startdataengineering.com
Struggling to come up with a data engineering project idea? Overwhelmed by all the setup necessary to start building a data engineering project? Don't know where to get data for your side project? Then this post is for you. We will go over the key components, and help you understand what you need to design and build your data projects. We will do this using a sample end-to-end data engineering project.| www.startdataengineering.com
Worried about introducing data pipeline bugs, regressions, or introducing breaking changes? Then this post is for you. In this post, you will learn what CI is, why it is crucial to have data tests as part of CI, and how to create a CI pipeline that automatically runs data tests on pull requests using Github Actions.| www.startdataengineering.com
Trying to incorporate testing in a data pipeline? This post is for you. In this post, we go over 4 types of tests to add to your data pipeline to ensure high-quality data. We also go over how to prioritize adding these tests, while developing new features.| www.startdataengineering.com
Frustrated that hiring managers are not reading your Github projects? then this post is for you. In this post, we discuss a way to impress hiring managers by hosting a live dashboard with near real-time data. We will also go over coding best practices such as project structure, automated formatting, and testing to make your code professional. By the end of this post, you will have deployed a live dashboard that you can link to your resume and LinkedIn.| www.startdataengineering.com
Working with a dataset that is too large to fit in memory? Then this post is for you. In this post, we will write memory efficient data pipelines using python generators. We also cover the common generator patterns you will need for your data pipelines.| www.startdataengineering.com
Ensure your data meets basic and business specific data quality constraints. In this post we go over a data quality testing framework called great expectations, which provides powerful functionality to cover the most common test cases and the ability to group them together and run them.| www.startdataengineering.com
Author, A.M. Kuchling < amk@amk.ca>,. Abstract: This document is an introductory tutorial to using regular expressions in Python with the re module. It provides a gentler introduction than the corr...| Python documentation
While The Python Language Reference describes the exact syntax and semantics of the Python language, this library reference manual describes the standard library that is distributed with Python. It...| Python documentation
DBT (data build tool) tutorial. Build a project simulating a real life ELT project using the data build tool.| www.startdataengineering.com
This module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML. Each has been recast in a form suitable for Python. The module standardizes a core set...| Python documentation
Source code: Lib/functools.py The functools module is for higher-order functions: functions that act on or return other functions. In general, any callable object can be treated as a function for t...| Python documentation