This is a data system internals blog post. So if you enjoyed my table formats internals blog posts, or writing on Apache Kafka internals or Apache BookKeeper internals, you might enjoy this one. But beware, it’s long and detailed. Also note that I work for Confluent, which also runs Apache Flink but does not run nor contributes to Apache Fluss. However, this post aims to be a faithful and objective description of Fluss. Apache Fluss is a table storage engine for Flink being developed by Ali...| Jack Vanlightly
Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and formats as one coherent resource. Not one storage system and storage format to rule them all, but virtua| Jack Vanlightly
If you’re following the world of AI right now, no doubt you saw Jason Lemkin’s post on social media reporting how Replit’s AI deleted his production database, despite it being told not to touch anything at all due to a code freeze. After deleting his database, the AI even advised him that a rollback would be impossible and the data was gone forever. Luckily, he went against that advice, performed the rollback, and got his data back. Then, a few days later I stumbled on another case, thi...| Jack Vanlightly
A recent LinkedIn post by Nick Lebesis caught my attention with this brutal take on the difference between good startup founders and coward startup founders. I recommend you read the entire thing to fully understand the context, but I’ve pasted the part that most resonated with me below: "| Jack Vanlightly
Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries . Where a reliable trigger exists, a new boundary is created where that trigger becomes responsible for ensuring the event| Jack Vanlightly
{ dist sys, formal verification, event streaming }| Jack Vanlightly
Big data isn’t dead; it’s just going incremental If you keep an eye on the data space ecosystem like I do, then you’ll be aware of the rise of DuckDB and its message that big data is dead . The idea comes from two industry papers (and associated data sets), one from the Redshift team ( paper| Jack Vanlightly
Conway’s Law: " Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure. " This is playing out worldwide across hundreds of thousands of organizations, and it is no more evident than in the spl| Jack Vanlightly
Microservices, functions, stream processors and AI agents represent nodes in our graph. An incoming edge represents a trigger of work in the node, and the node must do the work reliably. I have been using the term reliable progress but I might have used durable execution if it hadn’t already been used to define a specific type of tool.| Jack Vanlightly
In part 2, we built a mental framework using a graph of nodes and edges to represent distributed work. Workflows are subgraphs coordinated via choreography or orchestration. Reliability, in this model, means reliable progress: the result of reliable triggers and progressable work. In part 3 we refine this graph model in terms of different types of coupling between nodes, and how edges can be synchronous or asynchronous. Let’s set the scene with an example, then dissect that example with the...| Jack Vanlightly
In part 1, we described distributed computation as a graph and constrained the graph for this analysis to microservices, functions, stream processing jobs and AI Agents as nodes, and RPC, queues, and topics as the edges. Within our definition of The Graph, a node might be a function (FaaS or microservice), a stream processing job, an AI Agent, or some kind of third-party service. An edge might be an RPC channel, a queue or a topic. For a workflow to be reliable, it must be able to make prog...| Jack Vanlightly
At some point, we’ve all sat in an architecture meeting where someone asks, “Should this be an event? An RPC? A queue?”, or “How do we tie this process together across our microservices? Should it be event-driven? Maybe a workflow orchestration?” Cue a flurry of opinions, whiteboard arrows, and vague references to sagas. Now that I work for a streaming data infra vendor, I get asked: “How do event-driven architecture, stream processing, orchestration, and the new durable execution...| Jack Vanlightly
In this latest post of the disaggregated log replication survey, we’re going to look at the Apache BookKeeper Replication Protocol and how it is used by Apache Pulsar to form topic partitions. Raft blends the roles and responsibilities into one monolithic protocol, MultiPaxos separates the monolithic protocol into separate roles, and Apache Kafka separates the protocol and roles into control-plane/data-plane. How do Pulsar and BookKeeper divide and conquer the duties of log replication? Let...| Jack Vanlightly
In this post, we’re going to look at the Kafka Replication Protocol and how it separates control plane and data plane responsibilities. It’s worth noting there are other systems that separate concerns in a similar way, with RabbitMQ Streams being one that I am aware of.| Jack Vanlightly
Over the next series of posts, we'll explore how various real-world systems and some academic papers have implemented log replication with some form of disaggregation. In this first post we’ll look at MultiPaxos. There are no doubt many real-world implementations of MultiPaxos out there, but I want to focus on Neon’s architecture as it is illustrative of the benefits of thinking in terms of logical abstractions and responsibilities when designing complex systems.| Jack Vanlightly
Technology changes can be sudden (like generative AI) or slower juggernauts that kick off a slow chain reaction that takes years to play out. I would place object storage and its enablement of disaggregated architectures in that latter category. The open table formats, such as Apache Iceberg, Delta Lake, and Apache Hudi, form part of this chain reaction, but things aren’t stopping there. I’ve written extensively about the open table formats (OTFs). In my original Tableflow post, I wrote t...| Jack Vanlightly
This post continues my series looking at log replication protocols, within the context of state-machine replication (SMR) or just when the log itself is the product (such as Kafka). So far I’ve been looking at Virtual Consensus, but now I’m going to widen the view to look at how log replication protocols can be disaggregated in general (there are many ways). In the next post, I’ll do a survey of log replication systems in terms of the types of disaggregation described in this post.| Jack Vanlightly
"True stability results when presumed order and presumed disorder are balanced. A truly stable system expects the unexpected, is prepared to be disrupted, waits to be transformed." — Tom Robbins This post continues my series looking at log replication protocols, within the context of state-machine replication (SMR) or just when the log itself is the product (such as Kafka). I’m going to cover some of the same ground from the Introduction to Virtual Consensus in Delos post, but focus on on...| Jack Vanlightly
This is the first of a number of posts looking at log replication protocols, mainly in the context of state machine replication (SMR). This first post will look at a log replication protocol design called Virtual Consensus from the paper: Virtual Consensus in Delos. In 2020, a team of researchers and engineers from Facebook, led by Mahesh Balakrishnan, published their work (linked above) on a log replication design called Virtual Consensus that they had built as the log replication layer of t...| Jack Vanlightly
Rumors are swirling that Snowflake intends to acquire Redpanda and many are questioning why and what impact this might have on Confluent. First, let’s remember that these are just rumors and there’s nothing official. But given that people are speculating, here are my thoughts on how to interpret such an acquisition, whether it ends up happening or not. There are a number of market trends in play right now, such as the rise of Iceberg and open data, as well as the war with Databricks and S...| Jack Vanlightly
Two interesting blog posts about AI agents have caught my attention over the last few weeks. * Anthropic wrote Building Effective Agents. * Chip Huyen wrote Agents. Ethan Mollick has also written some excellent blog posts recently: * The Present Future: AI's Impact Long Before Superintelligence * Prophecies of the Flood * What just happened In this post, I’ll explore what some of the leading experts in this area are saying about AI agents and the challenges ahead.| Jack Vanlightly
I just read Phil Eaton’s post on reaching the 1 million page views milestone, which he was inspired to blog about due to Murat Demirbas doing the same thing back in 2017. I just checked my all time blog stats and it turns out I can write one of these too 😄| Jack Vanlightly
After posting my last Kafka transactions diary entry, JP (the Fizzbee maintainer) wrote a refactored version using non-atomic actions and a different way of representing the network. It’s a very interesting variant and I’m tempted to switch over to his version. When an action is not atomic, execution of an action could yield at any moment to a different action in a different role instance or even the same role instance. With this yielding we can also replace explicit message passing with ...| Jack Vanlightly
So what should we do instead? This is less of a technology problem and more of a structural problem. We can’t just add some missing features to data tooling; it’s about solving a people problem, how we organize together, how team incentives line up, and also about applying well-established software| Jack Vanlightly
{ dist sys, formal verification, event streaming }| Jack Vanlightly
Jack Vanlightly| Jack Vanlightly