Nov. 24, 2023, 9:40 p.m. | /u/ImGallo

Natural Language Processing www.reddit.com

Hey, I've been working on PDF data extraction for a while now. I usually rely on well-known Python libraries like PyPDF2, PyMUPDF, and TABULA. I've also had good results using Azure Cognitive Services in conjunction with regular expressions and the aforementioned libraries. While I haven't personally used them, I've seen that some models on HuggingFace are being used for PDF data extraction as well.
So, my question is: What are the most important or useful tools/techniques/models for extracting information from …

azure azure cognitive services cognitive data data extraction extraction good hey huggingface languagetechnology libraries pdf pymupdf python services them

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Software Engineer, Generative AI (C++)

@ SoundHound Inc. | Toronto, Canada