This is a data system internals blog post. So if you enjoyed my table formats internals blog posts, or writing on Apache Kafka internals or Apache BookKeeper internals, you might enjoy this one. But beware, it’s long and detailed. Also note that I work for Confluent, which also runs Apache Flink but does not run nor contributes to Apache Fluss. However, this post aims to be a faithful and objective description of Fluss. Apache Fluss is a table storage engine for Flink being developed by Ali...| Jack Vanlightly
Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and formats as one coherent resource. Not one storage system and storage format to rule them all, but virtua| Jack Vanlightly
rqlite is a lightweight, user-friendly, open-source, distributed relational database. It’s written in Go and uses SQLite as its storage engine. When it comes to distributed systems the CAP theorem is an essential concept. It states that it's impossible for a distributed database to simultaneously provide Consistency, Availability, and Partition tolerance. The challenge is in the face of a network partition,…| Vallified
In this post I’ll share my notes on the book Software Architecture: The Hard Parts by Neal Ford, Mark Richards, Pramod Sadalage and Zhamak Dehghani. In summary, this book presents trade-offs between different ways to implement microservices. The authors present general guidance on how to split services and databases, and the tradeoffs involved. The book has about 400 pages and 15 chapters and in this post I go over each of the chapters and then provide a summary and impressions at the end.| NP-Incompleteness
Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries . Where a reliable trigger exists, a new boundary is created where that trigger becomes responsible for ensuring the event| Jack Vanlightly
Understand commands, events, and brokers to build real-time APIs. Learn how to avoid polling and design more responsive.| Rico Fritzsche
Zero-Knowledge Proofs (ZKPs), which generate evidence for a third party to confirm the accurate execution of a computation, and Fully Homomorphic Encryption (FHE), which enables calculations on encrypted data, will be combined with distributed systems algorithms, that are capable of tolerating significant network failures and similar to those employed by Bitcoin. Together they will be utilized to comply with regulations while creating trustless applications.| LambdaClass Blog
Microservices, functions, stream processors and AI agents represent nodes in our graph. An incoming edge represents a trigger of work in the node, and the node must do the work reliably. I have been using the term reliable progress but I might have used durable execution if it hadn’t already been used to define a specific type of tool.| Jack Vanlightly
In part 2, we built a mental framework using a graph of nodes and edges to represent distributed work. Workflows are subgraphs coordinated via choreography or orchestration. Reliability, in this model, means reliable progress: the result of reliable triggers and progressable work. In part 3 we refine this graph model in terms of different types of coupling between nodes, and how edges can be synchronous or asynchronous. Let’s set the scene with an example, then dissect that example with the...| Jack Vanlightly
In part 1, we described distributed computation as a graph and constrained the graph for this analysis to microservices, functions, stream processing jobs and AI Agents as nodes, and RPC, queues, and topics as the edges. Within our definition of The Graph, a node might be a function (FaaS or microservice), a stream processing job, an AI Agent, or some kind of third-party service. An edge might be an RPC channel, a queue or a topic. For a workflow to be reliable, it must be able to make prog...| Jack Vanlightly
At some point, we’ve all sat in an architecture meeting where someone asks, “Should this be an event? An RPC? A queue?”, or “How do we tie this process together across our microservices? Should it be event-driven? Maybe a workflow orchestration?” Cue a flurry of opinions, whiteboard arrows, and vague references to sagas. Now that I work for a streaming data infra vendor, I get asked: “How do event-driven architecture, stream processing, orchestration, and the new durable execution...| Jack Vanlightly
In this latest post of the disaggregated log replication survey, we’re going to look at the Apache BookKeeper Replication Protocol and how it is used by Apache Pulsar to form topic partitions. Raft blends the roles and responsibilities into one monolithic protocol, MultiPaxos separates the monolithic protocol into separate roles, and Apache Kafka separates the protocol and roles into control-plane/data-plane. How do Pulsar and BookKeeper divide and conquer the duties of log replication? Let...| Jack Vanlightly
In this post, we’re going to look at the Kafka Replication Protocol and how it separates control plane and data plane responsibilities. It’s worth noting there are other systems that separate concerns in a similar way, with RabbitMQ Streams being one that I am aware of.| Jack Vanlightly
Over the next series of posts, we'll explore how various real-world systems and some academic papers have implemented log replication with some form of disaggregation. In this first post we’ll look at MultiPaxos. There are no doubt many real-world implementations of MultiPaxos out there, but I want to focus on Neon’s architecture as it is illustrative of the benefits of thinking in terms of logical abstractions and responsibilities when designing complex systems.| Jack Vanlightly
This post continues my series looking at log replication protocols, within the context of state-machine replication (SMR) or just when the log itself is the product (such as Kafka). So far I’ve been looking at Virtual Consensus, but now I’m going to widen the view to look at how log replication protocols can be disaggregated in general (there are many ways). In the next post, I’ll do a survey of log replication systems in terms of the types of disaggregation described in this post.| Jack Vanlightly
"True stability results when presumed order and presumed disorder are balanced. A truly stable system expects the unexpected, is prepared to be disrupted, waits to be transformed." — Tom Robbins This post continues my series looking at log replication protocols, within the context of state-machine replication (SMR) or just when the log itself is the product (such as Kafka). I’m going to cover some of the same ground from the Introduction to Virtual Consensus in Delos post, but focus on on...| Jack Vanlightly
This is the first of a number of posts looking at log replication protocols, mainly in the context of state machine replication (SMR). This first post will look at a log replication protocol design called Virtual Consensus from the paper: Virtual Consensus in Delos. In 2020, a team of researchers and engineers from Facebook, led by Mahesh Balakrishnan, published their work (linked above) on a log replication design called Virtual Consensus that they had built as the log replication layer of t...| Jack Vanlightly
We interview the team developing Iroh, a Rust peer-to-peer library that just works.| LambdaClass Blog
Both distributed aggregation and replication for high availability (yes, I am thinking of CRDTs) are techniques that can help tackle geo-replication, offline operation and edge/fog computing. Distributed aggregation often shares many properties in common with CRDT style convergent replication, but … Continue reading →| HASlab
We are opening a Post-Doc Position in HASLab supported by a one year grant. The grant is for a PhD holder, to join our research line in Distributed Data Aggregation and Monitoring. The successful candidate is expected to research on distributed aggregation, e.g. Flow … Continue reading →| HASlab
HASLab Seminar Abstract: Modern databases underlying large-scale Internet services guarantee immediate availability and tolerate network partitions at the expense of providing only weak forms of consistency, commonly dubbed eventual consistency. Even though the folklore notion of eventual consistency is very … Continue reading →| HASlab
We are opening several new positions in HASLab, to be supported by (up to) two year research grants. – One of the grants is for a Post Doc, to join the local team that is working on the evol…| HASlab
Returning Arrow tables over HTTP is relatively straightforward. The Arrow IPC streaming format is repurposed to the HTTP use case and you can return each serialized IPC batch as a chunk of the HTTP response — Transfer-Encoding: chunked — to enable streaming clients. Arrow IPC streams have a MIME type registered with IANA so you can also set the appropriate Content-Type header on your HTTP responses. 1 Content-Type: application/vnd.apache.arrow.stream You can find Python examples of this i...| Runtime Checks
If you're confused by the CAP theorem, you're not alone. In this post, we walk through some scenarios to see when CAP applies and how to use CAP to consider overall system availability.| DeBrie Advisory Blog
In this post, understand the different concepts of consistency as applied to distributed databases, as well as some issues with the conversation of consistency.| DeBrie Advisory Blog
Introduction| Andrei’s Personal Blog
What is this about?| Andrei’s Personal Blog
There are significant changes happening in distributed systems.| Colin Breck
TLDR ★★★★ Very easy to read. Direct link to the paper. Interesting takeaways Novel approach to conflict resolution: Unlike most data systems that push conflict resolution to the write phase, Dynamo…| CodeKraft
The point of failure is precisely where Web3 technology meets you and me. We must change our approach, not just to avoid failure, but to fully explore and expand the gifts of being human. The post Web3’s future #fail is avoidable appeared first on Philip Sheldrake.| Philip Sheldrake
Like many people who work with JVM languages, I do have many version of Java JDK installed on my machine. There are few utilities which help managing which version a given project should use and how to switch quickly between versions. Some of the most popular are: Jabba jenv Others prefer, a much simpler way to switch between JDKs like what is described in Managing Multiple JDKs on macOS. Similarly to the previous article, I have a small function in my ~/.profile which allows me to quickly sw...| Bits and pieces
Viewstamped Replication is one of the earliest consensus algorithms for distributed systems. It is designed around the log replication concept of state machines and it can be efficiently implemented for modern systems. The revisited version of the paper offers a number of improvements to the algorithm from the original paper which both: simplifies it and makes it more suitable for high volume systems. The original paper was published in 1988 which is ten years before the Paxos algorithm 1 was...| Bits and pieces
In 2017 I decided to print some PDFs to read instead of just have them in an ever-growing folder of Papers to Read. Here is the list of the ones I managed to read this year. Out of the Tar Pit / Ben Moseley, Peter Marks / 2006 This is my favorite paper. It is a bit long (around 60 pages), but well worth reading. It is very thorough in defining what software complexity is. The authors make the distinction between essential and accidental complexity (following Fred Brook’s ideas on the topic)...| Runtime Checks
Envoy is a service mesh proxy with a ton of built-in capabilities. In this post, I’ll be discussing the Original Destination feature in Envoy.| Venil Noronha
Service meshes solve some of the key challenges in the cloud-native world today, and in this post I’ll be discussing about security.| Venil Noronha
The concept of State Machines is not new, and they form the core of a lot of systems that we come across daily, for example, elevators and traffic lights. State machines make it a lot easy to understand and implement multi-faceted systems where inputs can be asynchronous and are triggered by multiple sources.| Venil Noronha
Envoy is a programmable L3/L4 and L7 proxy that powers today’s service mesh solutions including Istio, AWS App Mesh, Consul Connect, etc. At Envoy’s core lie several filters that provide a rich set of features for observing, securing, and routing network traffic to microservices.| Venil Noronha
The sidecar proxy pattern is an important concept that lets Istio provide routing, metrics, security, and other features to services running in a service mesh. In this post I’ll explain key techniques that power Istio and I’ll also show you a way to build a simple HTTP traffic-sniffing sidecar proxy.| Venil Noronha
gRPC-Web enables web applications to access gRPC backends via a proxy like Envoy. Envoy serves as the default proxy for Istio, and, so, we can leverage Istio’s EnvoyFilter construct to create seamless, well connected, Cloud-Native web applications.| Venil Noronha
Envoy is a lightweight service proxy designed for Cloud Native applications. It’s also one of the few proxies that support gRPC, which is based on the H2 (HTTP/2) protocol. gRPC is a high performance RPC (Remote Procedure Call) framework and it supports a plethora of environments.| Venil Noronha
So, you’ve walked through the Istio Mixer Adapter guide and want to now publish your own amazing adapter? This post will run you through the process of setting sail your own adapter on the seas of production.| Venil Noronha
Ever wondered what mTLS (mutual TLS) looks like? Come, learn to implement mTLS using Golang and OpenSSL. Introduction TLS (Transport Layer Security) provides the necessary encryption for applications when communicating over a network. HTTPS (Hypertext Transfer Protocol Secure) is an extension of HTTP that leverages TLS for security. The TLS| Venil Noronha
Istio provides sophisticated routing mechanics via concepts like VirtualService, DestinationRule, Gateway, etc. Istio 1.0 enabled HTTP traffic shifting via weighted route definitions. I was able to contribute a similar feature for TCP/TLS services via my PRs on Envoy and on Istio. The feature in Envoy was released in 1.8.0 and| Venil Noronha
blog.grobox.de -| Torsten's Thoughtcrimes
blog.grobox.de -| Torsten's Thoughtcrimes
The point of failure is precisely where Web3 technology meets you and me. We must change our approach, not just to avoid failure, but to fully explore and expand the gifts of being human.| Generative Identity
In this post, we will talk about some exciting and powerful use cases of event-driven systems that can be solved using RabbitMQ and SNS - SQS combo. Goal Say we have several microservices running in production, and each service encloses a business entity.| Dinesh Gowda
Introduction The Challenge Using Kubernetes, running application workloads resiliently and reliably is easy, if the workload is stateless. Let's think of a file format converter micro service or a simple web app serving static content| Marc Brandner
Quorum systems allow consistency of replicated data; every time a group of servers needs to agree on something, a quorum is involved in the decisions. An example could be the leaderless databases, such as Dynamo. Read-write quorums define two configurable values, R and W.| Samuele Resca
One of my favorite technical projects involved overcoming a network constraint. The virtual machines (VMs) hosting the core services kept exhausting available ports. Once all ports were used up, ne…| CodeKraft
Shared mutable state invites complexity into our programs. Programming languages help with this complexity inside a program, but not across network boundaries between programs.| Interjected Future