Some PDF files can be “tagged” which means they contain information about the structure of the file. This structure is embedded as metadata within the PDF and is made up of a hierarchy of tags that label elements such as headings, paragraphs, lists, tables, and images. This is very similar to HTML where text is […] The post How to extract text from a PDF as Markdown appeared first on Java PDF Blog and was written by Jacob Collins.