This is a quick blog post to remind me how to connect Apache Flink to a Kafka topic on Confluent Cloud. You may wonder why you’d want to do this, given that Confluent Cloud for Apache Flink is a much easier way to run Flink SQL. But, for whatever reason, you’re here and you want to understand the necessary incantations to get this connectivity to work. There are two versions of this connectivity - with, and without, using the Schema Registry for Avro. MVP: Just connect to a Kafka topic; n...| rmoff's random ramblings
Not got time for all this? I’ve marked 🔥 for my top reads of the month :)| Interesting links - July 2025
Iceberg nicely decouples storage from ingest and query (yay!). When we say "decouples" it’s a fancy way of saying "doesn’t do". Which, in the case of ingest and query, is really powerful. It means that we can store data in an open format, populated by one or more tools, and queried by the same, or other tools. Iceberg gets to be very opinionated and optimised around what it was built for (storing tabular data in a flexible way that can be efficiently queried). This is amazing! But, what I...| rmoff's random ramblings
Kafka Connect is a framework for data integration, and is part of Apache Kafka.| Writing to Apache Iceberg on S3 using Kafka Connect with Glue catalog
Not got time for all this? I’ve marked 🔥 for my top reads of the month :) Open Table Formats / Data Lakehouses 🔥 Instead of enthusiastically hopping on the Iceberg bandwagon with both webbed feet, DuckDB Labs have been quietly building their own format. DuckLake was announced at the beginning of the month, and is a replacement for both an OTF such as Iceberg and the metadata catalog that an OTF user will invariably need to wire up too. I had a quick poke around it, and Tobias Müller ...| rmoff's random ramblings
I’m using Flink 1.20, since as of the time of writing (2025-06-24) the Iceberg connector doesn’t yet support Flink 2.0 (it’s due with Iceberg 1.10.0).| Writing to Apache Iceberg on S3 using Flink SQL with Glue catalog
After a week’s holiday ("vacation", for y’all in the US) without a glance at anything work-related, what joy to return and find that the DuckDB folk have been busy, not only with the recent 1.3.0 DuckDB release, but also a brand new project called DuckLake. Here are my brief notes on DuckLake. Getting our ducks in a row Let’s be clear: Naming things is hard. Even so, the DuckLake name is confusing because it implies a tight-coupling to DuckDB where there is none (other than the ownershi...| rmoff's random ramblings
Not got time for all this? I’ve marked 🔥 for my top reads of the month :) Data Engineering 🔥 Amongst all the background noise of ETL vs ELT vs ZeroETL, this primer from Ben Rogojan (a.k.a. "The Seattle Data Guy") is a great reminder of the actual 'T' that needs doing to our data, wherever it is that we do it. Ask ten engineers in the data space the difference between job titles and you’ll get a dozen opinions. In a sense this doesn’t matter, but this post is useful for laying out ...| rmoff's random ramblings