# Natural Language Toolkit: Tokenizers| www.nltk.org
How do you find near-duplicates in a massive collection of documents? An exploration of the Jaccard similarity metric, and the MinHash hashing trick used to efficiently approximate it at web scale.| Made of Bugs
About The Project Trying to stay up to date with the latest security research is challenging. There are countless security blog posts, and interesting academic papers to keep up with. Realistically you don’t have time to sit down and read everything that looks interesting. Wouldn’t it be great to have your own personal “Audible”-esque service to listen to articles as you did chores around the house? This blog post is about integrating AWS’ Text-to-Speech service, “Polly” to gene...| www.archcloudlabs.com