You know, I did fight it for a long time, and I’m still fighting it. Look, no one wants to become a Terraform engineer; that is pain and suffering. But, we all understand the benefits of IAC (infrastructure as code), and SHOULD be using it in our daily tech lives, or pushing towards it. But […] The post The Era of the YAML Engineer appeared first on Confessions of a Data Guy.| Confessions of a Data Guy
So, the classic newbie question. DuckDB vs Polars, which one should you pick? This is an interesting question, and actually drives a lot of search traffic to this website on which you find yourself wasting time. I thank you for that. This is probably the most classic type of question that all developers eventually ask […] The post DuckDB vs Polars. Wait. DuckDB and Polars. appeared first on Confessions of a Data Guy.| Confessions of a Data Guy
https://www.confessionsofadataguy.com/wp-content/uploads/2025/09/pickles.png| Confessions of a Data Guy
Well, all the bottom feeders (Iceberg and DuckDB users) are howling at the moon and dancing around a bonfire at midnight trying to cast their evil spells on the rest of us. Apache Iceberg writes with DuckDB? Better late than never I suppose. Not going to lie, Iceberg writes with MotherDuck is an interesting concept. […] The post Apache Iceberg Writes with DuckDB (or not) appeared first on Confessions of a Data Guy.| Confessions of a Data Guy
So, you’re just a regular old Data Engineer crawling along through the data muck, barley keeping your head above the bits and bytes threatening to drown you. At point in time you were full of spit and vinegar and enjoyed understanding and playing with every nuance known to man. But, not you are old and […] The post How to tune Spark Shuffle Partitions. appeared first on Confessions of a Data Guy.| Confessions of a Data Guy
Ok, not going to lie, I rarely find anything of value in the dregs of r/dataengineering, mostly I fear, because it’s %90 freshers with little to no experience. These green behind the ear know-it-all engineers who’ve never written a line of Perl, SSH’d into a server, and have no idea what a LAMP stack is. […]| Confessions of a Data Guy
I was recently working on a PySpark pipeline in which I was using the JDBC option to write about 22 million records from a Spark DataFrame into a Postgres RDS database. Hey, why not use the built in method provided by Spark, how bad could it be? I mean it’s not like the creators and […] The post The Fastest Way to Insert Data to Postgres appeared first on Confessions of a Data Guy.| Confessions of a Data Guy
Did you know that Polars, that Rust based DataFrame tool that is one the fastest tools on the market today, just got faster?? There is now GPU execution on available on Polars that makes it 70% faster than before!! The post Polars on GPU: Blazing Fast DataFrames for Engineers appeared first on Confessions of a Data Guy.| Confessions of a Data Guy