Extracting text from a PDF is usually straightforward when it’s in English and doesn’t have embedded fonts. However, once those assumptions are removed, it becomes challenging to use basic python libraries like pdfminer or pdfplumber. Last month, I was tasked with extracting text from a Gujarati-language PDF and importing data fields such as name, address, city, etc., into JSON format.