So, the classic newbie question. DuckDB vs Polars, which one should you pick? This is an interesting question, and actually drives a lot of search traffic to this website on which you find yourself wasting time. I thank you for that. This is probably the most classic type of question that all developers eventually ask […] The post DuckDB vs Polars. Wait. DuckDB and Polars. appeared first on Confessions of a Data Guy.| Confessions of a Data Guy
Well, all the bottom feeders (Iceberg and DuckDB users) are howling at the moon and dancing around a bonfire at midnight trying to cast their evil spells on the rest of us. Apache Iceberg writes with DuckDB? Better late than never I suppose. Not going to lie, Iceberg writes with MotherDuck is an interesting concept. […] The post Apache Iceberg Writes with DuckDB (or not) appeared first on Confessions of a Data Guy.| Confessions of a Data Guy
Statically compiling DuckDB can improve security, improve startup time, and support offline environments.| Colin Breck
Learn Delta tables with ColumnMapping in Polars, addressing solutions, performance, and efficiency using alternative methods| Sandeep Pawar | Microsoft Fabric
Preconditions To use the Amazon SageMaker Lakehouse with DuckDB, you first have to create a S3 Table bucket, a namespace and an actual S3 Table. All those steps are described in my other blog post “Query S3 Tables with DuckDB”, so please make sure yo...| tobilg.com
Data Lakes come in a broad variety and lots of different flavors. AWS, Azure, Google Cloud, Snowflake, DataBricks, etc. they all have their specialties, strong and weak sides. Common among them is that the most, if not all, of them use Object Storage...| tobilg.com
The General Transit Feed Specification (GTFS) is a standardized, open data format for public transportation schedules and geographic information. In practice, a GTFS feed is simply a ZIP archive of text (CSV) tables - such as stops.txt, routes.txt, a...| tobilg.com
I had a use case that eventually required performing IP address lookups in a given list of CIDR ranges, as I maintain an open source project that gathers IP address range data from public cloud providers, and also wrote an article in my blog about an...| tobilg.com
A while ago I published sql-workbench.com and the accompanying blog post called "Using DuckDB-WASM for in-browser Data Engineering". The SQL Workbench enables its users to analyze local or remote data directly in the browser. This lowers the bar rega...| tobilg.com
Introduction DuckDB, the in-process DBMS specialized in OLAP workloads, had a very rapid growth during the last year, both in functionality, but also popularity amongst its users, but also with developers that contribute many projects to the Open Sou...| tobilg.com
This articles explains how the gathering and analyzing of public cloud provider IP address data is possible with DuckDB and Observerable| tobilg.com
Using AWS Serverless services and DuckDB as near-realtime Data Lake backend infrastructure| tobilg.com
A common task in S3-based Data Lakes is to repartition data, to optimize query patterns and speed. This article describes a serverless solution using DuckDB| tobilg.com
How to run DuckDB in a serverless way on AWS Lambda, with a custom layer.| tobilg.com
A few forward looking SQL dialects have started introducing lambda expressions to be used with functions operating on arrays| Java, SQL and jOOQ.
New dialects: jOOQ 3.20 ships with 2 new experimental dialects: ClickHouse is a fast-moving SQL dialect with a historic vendor-specific syntax that is gradually migrated to a more standards compliant alternative, which is why our support is still experimental. A lot of behaviours differ from what one would expect elsewhere, including NULL handling, which is … Continue reading jOOQ 3.20 released with ClickHouse, Databricks, and much more DuckDB support, new modules, Oracle type hierarchies, ...| Java, SQL and jOOQ.
The article discusses the evolution of business intelligence (BI) tools, questioning their longevity compared to the enduring nature of spreadsheets. It highlights how spreadsheets facilitated decision-making in the past but have become inadequate as data complexity increased. The author envisions a future "Spreadsheets 2.0," integrating advanced features, better orchestration, and AI support to revitalize the role of spreadsheets in data workflows.| DataDuel.co
Patrick Hoefler| Blog
There are significant changes happening in distributed systems.| Colin Breck
In the ever-evolving landscape of data management, DuckDB has carved out a niche for itself as a powerful analytical database designed for efficient in-process data analysis. It is particularly wel…| Shekhar Gulati
TigerEye releases open-source DuckDB.dart to simplify data-intensive application development - SiliconANGLE| SiliconANGLE
DuckDB and the R ecosystem| josiahparry.com
We recently pushed out two new and experimental features Coiled Jobs| phofl.github.io