A recurring problem when searching for text is identifying which parts of the text are in some sense useful. A first order solution is to just extract every word from the text, and match documents against whether they contain those words. This works really well if you don’t have a lot of documents to search through, but as the corpus of documents grows, so does the number of matches. It’s possible to bucket the words based on where they appear in the document, but this is not something I...