Not got time for all this? I’ve marked 🔥 for my top reads of the month :) Tip You can find previous editions of Interesting Linkshere. Data Engineering 🔥 Ben Rogojan (a.k.a. SeattleDataGuy) has a great list of 5 Things in Data Engineering That Still Hold True After 10 Years (guess what: data modelling matters, if you start with crap data you’ll end with crap data, and so on…). Veronika Durgin shares some good tips for building resilient data pipelines. Some good pointers for why y...| rmoff's random ramblings
How many topics do you have? 🔗 | Kafka to Iceberg - Exploring the Options
Aug 21, 2025| rmoff.net
This is a quick blog post to remind me how to connect Apache Flink to a Kafka topic on Confluent Cloud. You may wonder why you’d want to do this, given that Confluent Cloud for Apache Flink is a much easier way to run Flink SQL. But, for whatever reason, you’re here and you want to understand the necessary incantations to get this connectivity to work. There are two versions of this connectivity - with, and without, using the Schema Registry for Avro. MVP: Just connect to a Kafka topic; n...| rmoff's random ramblings
Not got time for all this? I’ve marked 🔥 for my top reads of the month :)| Interesting links - July 2025
Iceberg nicely decouples storage from ingest and query (yay!). When we say "decouples" it’s a fancy way of saying "doesn’t do". Which, in the case of ingest and query, is really powerful. It means that we can store data in an open format, populated by one or more tools, and queried by the same, or other tools. Iceberg gets to be very opinionated and optimised around what it was built for (storing tabular data in a flexible way that can be efficiently queried). This is amazing! But, what I...| rmoff's random ramblings
Kafka Connect is a framework for data integration, and is part of Apache Kafka.| Writing to Apache Iceberg on S3 using Kafka Connect with Glue catalog
Not got time for all this? I’ve marked 🔥 for my top reads of the month :) Open Table Formats / Data Lakehouses 🔥 Instead of enthusiastically hopping on the Iceberg bandwagon with both webbed feet, DuckDB Labs have been quietly building their own format. DuckLake was announced at the beginning of the month, and is a replacement for both an OTF such as Iceberg and the metadata catalog that an OTF user will invariably need to wire up too. I had a quick poke around it, and Tobias Müller ...| rmoff's random ramblings
I’m using Flink 1.20, since as of the time of writing (2025-06-24) the Iceberg connector doesn’t yet support Flink 2.0 (it’s due with Iceberg 1.10.0).| Writing to Apache Iceberg on S3 using Flink SQL with Glue catalog
After a week’s holiday ("vacation", for y’all in the US) without a glance at anything work-related, what joy to return and find that the DuckDB folk have been busy, not only with the recent 1.3.0 DuckDB release, but also a brand new project called DuckLake. Here are my brief notes on DuckLake. Getting our ducks in a row Let’s be clear: Naming things is hard. Even so, the DuckLake name is confusing because it implies a tight-coupling to DuckDB where there is none (other than the ownershi...| rmoff's random ramblings
Not got time for all this? I’ve marked 🔥 for my top reads of the month :) Data Engineering 🔥 Amongst all the background noise of ETL vs ELT vs ZeroETL, this primer from Ben Rogojan (a.k.a. "The Seattle Data Guy") is a great reminder of the actual 'T' that needs doing to our data, wherever it is that we do it. Ask ten engineers in the data space the difference between job titles and you’ll get a dozen opinions. In a sense this doesn’t matter, but this post is useful for laying out ...| rmoff's random ramblings
It’s time… 🔗 | It’s Time We Talked About Time: Exploring Watermarks (And More) In Flink SQL
So. Many. Interesting. Links. Not got time for all this? I’ve marked 🔥 for my top reads of the month :)| Interesting links - April 2025
Create a statement 🔗 | Confluent Cloud for Apache Flink - Exploring the API
The problem with publishing February’s interesting links at the beginning of the month and now getting around to publishing March’s at the end is that I have nearly two months' worth of links to share 😅 So with no further ado, let’s crack on.| Interesting links - March 2025
I wrote a couple of weeks ago about using DuckDB and Rill Data to explore a new data source that I’m working with.| Kicking the tyres on the new DuckDB UI
Let’s imagine we’ve got a source of data with a nested array of multiple values.| How to explode nested arrays with Flink SQL
I was exploring some new data, joining across multiple tables, and doing a simple SELECT * as I’d not worked out yet which columns I actually wanted.| DuckDB tricks - renaming fields in a SELECT * across tables
Here’s a bunch of interesting links and articles about data that I’ve come across recently.| Interesting links - February 2025
The idea here is that you use Asciidoc’s inline pass macro to embed HTML comments (<!-- remember these? -→) in the generated HTML, which then passes the commands to Vale like vale off:| Disabling Vale Linting Selectively in Asciidoc
At Current 24 a few of us will be going for an early run (or walk) on Tuesday morning. Everyone is very welcome!| Current 2024 - 5k Fun Run (or Walk)