Login
From:
Made of Bugs
(Uncensored)
subscribe
Finding near-duplicates with Jaccard similarity and MinHash - Made of Bugs
https://blog.nelhage.com/post/fuzzy-dedup/
links
backlinks
How do you find near-duplicates in a massive collection of documents? An exploration of the Jaccard similarity metric, and the MinHash hashing trick used to efficiently approximate it at web scale.
Roast topics
Find topics
Find it!