Probabilistic record linkage, Data Deduplication, Data Science, Engineering and the Environment| www.robinlinacre.com
An interactive explanation of how a fault tolerant trie can be used for address matching| www.robinlinacre.com
A second equivalent mental model to help think about how we arrive at predicted probabilities in the Fellegi Sunter model| Your Site's RSS Feed
An set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. This article shows how to compute the model| Your Site's RSS Feed
Why I’m backing Vega-Lite as our default tool for data visualisation| Your Site's RSS Feed
Test how good you are at identifying UK birdsong recordings| Your Site's RSS Feed
Listen to UK birdsong using the xeno-canto API| Your Site's RSS Feed
What is the comparative carbon footprint of electric cars? As an existing petrol ICE car owner, should you switch to an electric car| Your Site's RSS Feed
The Downfall of Command and Control Data Leadership - why new big bang data platforms fail| Your Site's RSS Feed
How good is Splink: Are more complex probabilistic linkage models more accurate?| Your Site's RSS Feed
How good is Splink: Are more complex probabilistic linkage models more accurate?| Your Site's RSS Feed
A set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. This article shows how to compute the model from an algorithmic perspective| Your Site's RSS Feed
A visualisation of how the connected components algorithm works| Your Site's RSS Feed
| Your Site's RSS Feed
Demystifying Apache Arrow - some observations from a data scientist. Learning more about a tool that can filter and aggregate two billion rows on a laptop in two seconds| Your Site's RSS Feed
Effective testing of analytical models using automated sense checks| Your Site's RSS Feed
| Your Site's RSS Feed
An intuitive explanation for how the Expectation Maximisation algorithm is able to produce unsupervised estimates of Splink model parameters| Your Site's RSS Feed
Energy usage calculator for everyday activities| Your Site's RSS Feed
Evaluating 1 billion record comparisons to deduplicate 7 million records in two minutes| Your Site's RSS Feed
How to ensure that all available information is used to make predictions| Your Site's RSS Feed
| Your Site's RSS Feed
Simple flight distance calculator. Advert free. Export data to spreadsheet.| Your Site's RSS Feed
A history of my flights| Your Site's RSS Feed
Graph editor for illustrating clustering concepts| Your Site's RSS Feed
| Your Site's RSS Feed
The first in a series of interactive, explorable explanations of the Fellegi-Sunter model, providing an introduction to probabilistic record linkage (data deduplication).| Your Site's RSS Feed
Introducing Splink, a fast, accurate and scalable fuzzy record matching library that supports multiple SQL backends| Your Site's RSS Feed
The phrase 'why don't you just' is problematic| Your Site's RSS Feed
Understanding the Spark UI by example: the Left Join| Your Site's RSS Feed
A demo of a live Splink model in the browser| Your Site's RSS Feed
What will be the impact of LLMs on knowledge workers| Your Site's RSS Feed
My mental model of LLMs, their strengths and shortcomings| Your Site's RSS Feed
Deep dive into the role and interpretation of m and u probabilities in the Fellegi-Sunter model for probabilistic linkage. Learn how these probabilities impact match weights and how to quantify the strength of evidence in favor or against a record match.| Your Site's RSS Feed
Generate m and u probabilities to input into Splink. Part of the introduction to Fellegi Sunter series.| Your Site's RSS Feed
Generate m and u probabilities to input into Splink. Part of the introduction to Fellegi Sunter series.| Your Site's RSS Feed
A calculator for converting between match weights, probabilities, and Bayes factors| Your Site's RSS Feed
An set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. The dependencies between match weights.| Your Site's RSS Feed
My microblog| Your Site's RSS Feed
A set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. This article shows the derivation of the mathematical formulation of the model| Your Site's RSS Feed
| Your Site's RSS Feed
| Your Site's RSS Feed
Splink and the open source dividend| Your Site's RSS Feed
Open data should be served as CORS-enabled parquet files rather than using a custom API| Your Site's RSS Feed
Partial match weights in the Fellegi-Sunter model. Part of an explorable, interactive introduction to probabilistic record linkage (data deduplication) theory| Your Site's RSS Feed
Using treemaps to visualise updating the prior with information about a scenario in the Fellegi Sunter model| Your Site's RSS Feed
Visualising the correspondence between match weights, probabilities, Bayes factors and their intuitive explanations| Your Site's RSS Feed
An assortment of quotes that I like| Your Site's RSS Feed
How to improve the likelihood of success whilst reducing the governance burden on teams| Your Site's RSS Feed
Why DuckDB has become my go-to tool for data processing, offering simplicity, speed, and powerful features.| Your Site's RSS Feed
SQL should be the first option considered for new data engineering work. It’s robust, fast, future-proof and testable. With a bit of care, it’s clear and readable.| Your Site's RSS Feed
| Your Site's RSS Feed
| Your Site's RSS Feed
Splink 3 now offers support for Python and AWS Athena backends, in addition to Spark. It's now easier to use, faster and more flexible, and can be used for close to real time linkage.| Your Site's RSS Feed
How to work past the limits of LLMs to build more complex apps| Your Site's RSS Feed
The Thorniest Problem of Building an Analytical Platform: Enabling collaborative development of the platform itself without losing control of complexity.| Your Site's RSS Feed
| Your Site's RSS Feed
The emerging impact of LLMs on productivity| Your Site's RSS Feed
An set of interactive, explorable explanations of the Fellegi Sunter model of probabilistic record linkage. This article discusses match weights.| Your Site's RSS Feed
This article is part of the probabilistic linkage training materials| www.robinlinacre.com
A bag of tricks to improve the accuracy of geocoding| www.robinlinacre.com