OSWALD is a Write-Ahead Log (WAL) design built exclusively on object storage primitives. It works with any object storage service that…| nvartolomei.com
The FLP theorem| shachaf.net
I ordered a set of 10 Compute Blades in April 2023 (two years ago), and they just arrived a few weeks ago. In that time Raspberry Pi upgraded the CM4 to a CM5, so I ordered a set of 10 16GB CM5 Lite modules for my blade cluster. That should give me 160 GB of total RAM to play with.| www.jeffgeerling.com
XTX Markets is a leading algorithmic trading firm which uses state-of-the-art machine learning technology to produce price forecasts for over 50,000 financial instruments across equities, fixed income, currencies, commodities and crypto. It uses those forecasts to trade on exchanges and alternative trading venues, and to offer differentiated liquidity directly to clients worldwide. The firm trades over $250bn a day across 35 countries and has over 250 employees based in London, Singapore, New...| www.xtxmarkets.com
Terraform state is a distributed systems problem masquerading as file storage. Graph state fixes the bottlenecks.| Stategraph
Missing the Forest for the Sequence Trees.| lewiscampbell.tech
Learn how queues make horizontal scaling, scheduling, and flow control easier in cloud systems, and how to make them durable and observable.| www.dbos.dev
Dens Sumesh - Software Engineer| densumesh.dev
Why do we use caches at all? Can databases fully replace them?| avi.im
Agile approaches like Scrum recommend a "just enough" attitude in software development and this is also the case when you discuss tools. Ideally, you would work with a small team that is collocated, but this is not always possible and you might be running your project virtually with a distributed Scrum team scattered around the world. If you don't want to start using a sophisticated tool to manage your efforts, you might be interested in adopting some web tools that will fit your particular n...| Scrum Agile Project Management Expert
Fixed-size ring buffers with full lock freedom| h4x0r.org
There’s a semi-well-known adage in software development that says when you have a hard code change, you should “first make the hard change easy, and then make the easy change.” In other words, refactor the code (or do whatever else you need to do) to simplify the change you’re trying to make| blog.appliedcomputing.io
Sharing is Scaring: Why is Cloud File-Sharing Hard?| blog.brownplt.org
L2AW| law-theorem.com
Reinforcement learning meets iterated game theory meets theory of mind| The Dan MacKinlay stable of variably-well-consider’d enterprises
“The [structural] mechanism producing these problematic outcomes is really robust and hard to resolve.”…| Ars Technica
Matrix, the open protocol for secure decentralised communications| matrix.org
What even is distributed systems| notes.eatonphil.com
1 Background| jepsen.io
Viewstamped replication(VR) is a replication technique that takes care of failures when one or more nodes end up crashing in a cluster. It works as a wrapper on top of a non-distributed system & allows the underlying business logic to be applied independently while the protocol itself takes care of replication. The protocol was introduced in the paper and then was revised with a set of optimizations under a new paper known as Viewstamped replication revisited. | Distributed Computing Musings
If you’re building a backend mostly alone, Elixir lets you avoid service sprawl and ship features faster.| I'm Konstantin
Figure 1 The open-source AI scene has been kicking goals. People have wrestled models, datasets, and all the fixings away from the big-wigs. The final boss of that game is the access to expensive compute. Training a foundation model from scratch takes a warehouse full of GPUs that costs more than a small nation’s GDP. It’s been the one thing keeping AI development firmly in the hands of a few tech giants with cash to burn. Until now, maybe. So, the citizen science equivalent for the NN a...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Sungrow Power has revealed its newest hybrid residential energy storage system. Upgrades to its MG Series inverters and a battery will be launched in Q4.| Energy Storage
Cloud databases face a fundamental challenge: how to remain available and durable under node failures? Modern cloud databases approach this by separating two concerns that used to be tightly coupled: compute and storage. The database engine becomes stateless, while the write-ahead log gets replicated across multiple nodes to guarantee durability. If a database server dies, another one can pick up exactly where it left off by reading from the replicated log.| Benjamin Hilprecht
Matrix, the open protocol for secure decentralised communications| matrix.org
We share learnings and best practices for testing durable execution based on our testing of DBOS.| www.dbos.dev
A list of key concepts for building and testing reliable distributed systems, with basic definitions and deep references.| antithesis.com
Coarse-graining empowerment| The Dan MacKinlay stable of variably-well-consider’d enterprises
Monitoring my Homelab, Simply| b.tuxes.uk
I’ve had a few conversations about async code recently (and not so recently) and seen some code that seems to make wrong assumptions about async, so I figured out it was time to have a serious chat about async, what it’s for, what it guarantees and what it doesn’t.| Il y a du thé renversé au bord de la table !
Lots of projects claim to be the “smallest” or “simplest” Kubernetes, but they never provide data to back it up. Let’s look at how these distributions compare to Talos Linux. Note that Talos Linux is not a Kubernetes distribution, but rather a Linux distribution purpose-built for running upstream Kubernetes. Before we look at the data, […]| Sidero Labs
Put that script inside a folder, share the folder with someone via Syncthing or Dropbox or whatever,| holdtherobot.com
Author: Igor Konnov| Protocols Made Fun
MSc Computer science student| Fabian Lindfors
How deterministic simulation testing can help us build more reliable distributed systems and bridge the gap between development and production environments.| Pierre Zemb's Blog
Figure 1 An interesting inverse design question: how should I design a system to optimise for truthfulness? Brief summary here. @Frongillo2024Recent: This note provides a survey for the Economics and Computation community of some recent trends in the field of information elicitation. At its core, the field concerns the design of incentives for strategic agents to provide accurate and truthful information. Such incentives are formalized as proper scoring rules, and turn out to be the same obj...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Learning agents in a multi-agent system which account for and/or exploit the fact that other agents are learning too. This is one way of formalising the idea of theory of mind. Learning with theory of mind works out nicely for reinforcement learning, in e.g. opponent shaping, and may be an important tool for understanding AI agency and AI alignment, as well as aligning more general human systems. Other interesting things might arise from a good theory of other-aware learning, such ...| The Dan MacKinlay stable of variably-well-consider’d enterprises
1 Update, 2025-05-03| jepsen.io
The last few days I spent some time digging into the recently announced KIP-1150 ("Diskless Kafka"), as well AutoMQ’s Kafka fork, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration. This got me thinking, if we were to start all ove...| www.morling.dev
A curated collection of resources about deterministic simulation testing for distributed systems.| Pierre Zemb's Blog
We’re introducing a new layer of verification — a user-friendly, easily recognizable badge. Additionally, independent organizations can verify accounts directly through our Trusted Verifiers feature.| Bluesky
I am currently winding down the Mastodon bots I used to post sunrise and sunset times. The precipitating event is that the admin of the instance hosting the associated accounts demanded they be made nigh-undiscoverable, but the underlying cause is that it’s become increasing clear that Mastodon isn’t, and won’t ever be, a good platform for “asynchronous ephemeral notifications of any kind”. I’d also argue (more controversially) that it’s simply not good infrastructure for social...| Rob’s Posts
Personal website for some random tidbits I work on| maknee.github.io
Tips and lessons learned from building systems directly against object stores| SpiralDB
Bug Bash 2025 Conference Experience| concerningquality.com
#Git With Me| git.sr.ht
Neighbourhoodie Software is a software development company based in Berlin, Germany. We are experts in CouchDB, PouchDB, and Offline First.| neighbourhood.ie
Learn how Sequin implements Postgres logical replication with guaranteed message delivery and ordering. Discover how we built a high-throughput data pipeline without missing events.| Sequin blog
Figure 1 On rituals without (necessarily) faith. TBD Related: tribal bonding, mind altering substances. 1 Incoming Future Day 2025 – Science, Technology & the Future Wheal’s Homegrown Humans Newsletter Ritual Behavior, Habits, Human Culture, Religion, Civilization, Marriage, Death, Burning Man & Community | Dimitris Xygalatas | #75 (5) Psychedelics, Civilization, Religion, Death & Plant Medicine | Brian Muraresku | #1 Shamanism, Psychedelics, Social Behavior, Religion & Evolution of Huma...| The Dan MacKinlay stable of variably-well-consider’d enterprises
Figure 1 Notebook on the idea of human domestication. Possibly the opposite of being stroppy. Paul Christiano, What failure looks like Amongst the broader population, many folk already have a vague picture of the overall trajectory of the world and a vague sense that something has gone wrong. There may be significant populist pushes for reform, but in general these won’t be well-directed. Some states may really put on the brakes, but they will rapidly fall behind economically and militaril...| The Dan MacKinlay stable of variably-well-consider’d enterprises
In the "Let’s Take a Look at…!" blog series I am going to explore interesting projects, developments and technologies in the data and streaming space. This can be KIPs and FLIPs, open-source projects, services, and more. The idea is to get some hands-on experience, learn about potential use cases and applications, and understand the trade-offs involved. If you think there’s a specific subject I should take a look at, let me know in the comments below! That guy above? Yep, that’s me, w...| www.morling.dev
The financial transactions database designed for mission critical safety and performance. - tigerbeetle/tigerbeetle| GitHub
Here is how to implement a distributed lock with S3| quanttype.net
Antithesis' ability to play like a computer, not a human being, is central both to finding bugs and beating side-scrolling shooters.| antithesis.com
Why are scalable systems locally-inefficent, and locally-efficient systems unscalable? Plus, new book release!| buttondown.com
If you’re interested in supporting mybinder.org with cloud resources, financial resources, or human resources, please see the Support Binder page for how you can help. tl;dr: The 2i2c team is joining the mybinder.| 2i2c
Nation-scale Matrix deployments will fail if built on the community version of Synapse. Huge deployments need a different architecture, which is what Synapse Pro delivers.| Element Blog
Pushing the whole company into the past on purpose| rachelbythebay.com
OK, queues.| ferd.ca
While it’s trivial to measure the end-to-end runtime of a Dask workload, the next logical step - breaking down this time to understand if it could be faster - has historically been a much more arduous task that required a lot of intuition and legwork, for novice and expert users alike. We wanted to change that.Populated Fine Performance Metrics dashboard| Blog
Hendrik Makait2023-05-16| Blog
Miles Granger| Blog
At Coiled we develop Dask and automatically deploy it to large clusters of cloud workers (sometimes 1000+ EC2 instances at once!). In order to avoid surprises when we publish a new release, Dask needs to be covered by a comprehensive battery of tests — both for functionality and performance.Nightly tests report| Blog
Hendrik Makait| Blog
It’s finally done! 🎉 I am so excited to announce “Communicating Chorrectly with a Choreography”, the first zine from my research group. You can read it online, or print your own free copies to read offline!| decomposition ∘ al
The | vereis.com
Usually, in an event-driven architecture, events are emitted by one service and listened to by many (1:n). But what if it's the other way around? If one service needs to listen to events from many other services?| www.reactivesystems.eu
Cloudflare consistently generates the highest quality public incident writeups of any tech company. Their latest is no exception: Cloudflare incident on November 14, 2024, resulting in lost logs. I…| Surfing Complexity
Recently due to various events (namely a lot of people getting off of| dustycloud.org
Incremental computation represents a transformative (!) approach to data processing. Instead of recomputing everything when your input chang...| muratbuffalo.blogspot.com
November has sucked so far. One upside of the terrible nonsense is that more people are fleeing X. Many are choosing Bluesky. I’ve seen a bunch of takes about this recently, but I keep seeing things I disagree with. I figure that’s a good enough excuse to write more about this weird-assed social network.| anderegg.ca
If you were to design an open social networking protocol, what would that look like? Which metaphors and comparisons would you use to get a general idea of how the network functions? And what would you answer if people ask if your network is decentralised and federated?| fediversereport.com
We can categorize sync platforms across nine dimensions: data size, data update rate, the structure of the data, input latency, offline support, numbe...| stack.convex.dev
Author: Igor Konnov| Protocols Made Fun
A recurring line in discussion of federated, decentralized social media is that no one cares about it. They just want their Twitter without the Nazis. Which is okay. But how it looks on the backend matters. When the illusion of a unified user experience breaks, how accessible the escape pod| Kye Fox
Given that events play such a central role in event-driven architecture, there’s an astonishing lack of agreement on what should be contained in an event. This may be rooted in the fact that, depending on your perspective, events fulfill different purposes.| www.reactivesystems.eu
Sync platforms like Convex simplify distributed state management, ensuring that developers can focus on building their applications rather than managi...| stack.convex.dev
Note: I originally wrote this on an internal Amazon blog in 2006. This is the original version with a few edits. A newer paper covering the same content can be found on the AWS Builder's Library, Avoiding fallback in distributed systems. More complicated software is more buggy, so programmers try to apply Occam's Razor and code as simply as possible (but no simpler). But how does one define "complexity"?| a-nickels-worth.dev
Debunking the myth of "exactly-once delivery." Learn the real differences between messaging system guarantees and what they mean for your architecture.| Sequin blog
An opportunity for everyone to make a little self-test. Do you believe any of these five statements? If so, don't worry, you're not the only one, I've come across them many times. I'm very convinced they're untrue, though. This is my little attempt to better a shared understanding of some properties of event-driven architecture.| www.reactivesystems.eu
There’s much uncertainty and doubt (and maybe even fear?) around event-driven architecture. One example is the belief that it’s irrelevant for REST APIs, as using HTTP verbs is quite clearly not event-driven. But behold - you don’t always have to go all-in to win.| www.reactivesystems.eu
Yesterday I read an article describing the GCRA rate limiting| dotat.at
There are significant changes happening in distributed systems.| Colin Breck
In distributed systems, for instance when scaling out some workload to multiple compute nodes, it is a common requirement to select a leader for performing a given task: only one of the nodes should process the records from a Kafka topic partition, write to a file system, call a remote API, etc. Otherwise, multiple workers may end up doing the same task twice, overwriting each other’s data, and worse.| www.morling.dev
Sometimes, a seemingly simple and obvious solution can lead to a series of problems later on. This is especially true when adding retries.| Medium
TIL: Mermaid Gantt diagrams are great for displaying distributed traces in Markdown| brycemecum.com
How Notion build and grew our data lake to keep up with rapid growth| Notion
Today’s data scientist has a plethora of options when processing data in Apache Kafka. In terms of stream processing, popular options are Kafka Streams, ksqlDB, and Apache Flink. In terms of …| Robert Yokota
This post could’ve been titled “Nostr vs ATProto”, but that really isn’t what I wanted to do here. While I will be comparing and contrasting them a lot, and that’s kind of even the point of writing this, I didn’t want to really pit the two against each other at all, and especially not with the title. I also want to try avoiding commenting on the differences between the communities that have formed on the protocols and their apps, although I definitely will be looking at the philos...| shreyanjain.net
I've been developing and quickly deploying a distributed system, which is a class of software where bugs are expensive. A few hundred petabytes later, we haven't lost a single byte of data, also thanks to a simple trick which catches a large class of bugs when delegating responsibilities to possibly buggy software. It's a neat use of cryptography beyond security, so here's a small description.| mazzo.li
Talk on “What’s the Story in EBS Glory: Evolutions and Lessons in Building Cloud Block Store” paper for distributed systems reading group.| Andrey Satarin
In this blog post, we share our journey to build a ClickHouse-powered logging solution that today stores over 19 PiB of data (1.13 PiB compressed) in our AWS regions alone, and costs 200x less than Datadog.| ClickHouse
Towards zero-downtime upgrades of stateful systems| stevana.github.io
The CAP theorem is too simplistic and too widely misunderstood to be of much use for characterizing systems. Therefore I ask that we retire all references to the CAP theorem, stop talking about the CAP theorem, and put the poor thing to rest Martin Kleppmann In 2000, Eric Brewer introduced the CAP Conjecture during his keynote address Towards Robust Distributed Systems at the Principles of Distributed Computing conference. Brewer posited that a distributed system cannot achieve Consistency, A...| Dominik Tornow
This Valentine's Day, we're not just celebrating love and companionship; we're also celebrating the groundbreaking advancements in the Stalwart Mail Server with the release of version 0.6.0. In a world where reliability and flexibility in mail server management are more critical than ever, Stalwart Mail Server takes a significant leap forward with the introduction of distributed SMTP queues and the integration of expressions in configuration files. Let's delve into how these features transfor...| stalw.art
I have alluded to "hypothetical S2" (Stream Store), a true counterpart to S3 for data in motion. As I work on making S2 real, I wanted to share the design and how it shaped up. Vision Unlimited streams A pain point with most streaming data solutions ...| unofficial blog
Even if Agile approaches favor collocated teams, distributed Scrum teams with remote work are more common that what you might think. Many Agile software development teams are based on a virtual organization. This article presents some free online retrospective tools that can be used to facilitate retrospectives for distributed and remote Scrum teams. You will find in this article only online Scrum retrospective tools that are supposed to be used for free in the long term. We do not list tools...| Scrum Agile Project Management Expert