Preconditions To use the Amazon SageMaker Lakehouse with DuckDB, you first have to create a S3 Table bucket, a namespace and an actual S3 Table. All those steps are described in my other blog post “Query S3 Tables with DuckDB”, so please make sure yo...| tobilg.com
Data Lakes come in a broad variety and lots of different flavors. AWS, Azure, Google Cloud, Snowflake, DataBricks, etc. they all have their specialties, strong and weak sides. Common among them is that the most, if not all, of them use Object Storage...| tobilg.com
The General Transit Feed Specification (GTFS) is a standardized, open data format for public transportation schedules and geographic information. In practice, a GTFS feed is simply a ZIP archive of text (CSV) tables - such as stops.txt, routes.txt, a...| tobilg.com
Amazon S3 Tables was launched on December 3rd 2024, and provides you “storage that is optimized for tabular data such as daily purchase transactions, streaming sensor data, and ad impressions in Apache Iceberg format”. While S3 Tables can be queried ...| tobilg.com
DuckDB has gained a new feature in preview, that allows querying of Iceberg data in AWS S3 Tables. Setting up a S3 Table There are multiple steps which need to be performed to set up a S3 Table that can be then queried with tools like DuckDB. As the ...| tobilg.com
I had a use case that eventually required performing IP address lookups in a given list of CIDR ranges, as I maintain an open source project that gathers IP address range data from public cloud providers, and also wrote an article in my blog about an...| tobilg.com
A while ago I published sql-workbench.com and the accompanying blog post called "Using DuckDB-WASM for in-browser Data Engineering". The SQL Workbench enables its users to analyze local or remote data directly in the browser. This lowers the bar rega...| tobilg.com
Introduction DuckDB, the in-process DBMS specialized in OLAP workloads, had a very rapid growth during the last year, both in functionality, but also popularity amongst its users, but also with developers that contribute many projects to the Open Sou...| tobilg.com
What is Lambda@Edge AWS Lambda@Edge is an extension of the traditional AWS Lambda service, but with a crucial twist – it brings serverless computing capabilities closer to the end-users. In essence, Lambda@Edge empowers developers to run custom code ...| tobilg.com
As the Skillbuilder website is sometimes a bit hard to navigate, here's the full list of free badges you can do on AWS Skillbuilder: AWS Knowledge: Cloud Essentials AWS Knowledge: Architecting AWS Knowledge: Serverless AWS Knowledge: Object Stora...| tobilg.com
Introduction In today's data-driven world, interactive and visually appealing web-based maps have become an integral part of countless applications and services. Whether it's for navigation, location-based services, or data visualization, delivering ...| tobilg.com
This articles explains how the gathering and analyzing of public cloud provider IP address data is possible with DuckDB and Observerable| tobilg.com
Using AWS Serverless services and DuckDB as near-realtime Data Lake backend infrastructure| tobilg.com
A common task in S3-based Data Lakes is to repartition data, to optimize query patterns and speed. This article describes a serverless solution using DuckDB| tobilg.com
How to run DuckDB in a serverless way on AWS Lambda, with a custom layer.| tobilg.com
How to build a global reverse proxy with on-demand SSL support on AWS| tobilg.com
No, this article is not about buying properties close to lakes...| tobilg.com