I had a shower thought the other day - “I’m an ebook hoarder, I have a bunch of technical ebooks as mobi/epub files, but also pdfs. It’d be nice to be able to slurp them up into LLM tools - for summarizing, for categorising and the rest”. I’d already done epubs (see my recent Stonemouth analysis) so PDFs should be not too hard? It turns out PDFs are surprisingly complex - they often aren’t linear documents at all, they are very display/print oriented - and things that appear simpl...