Nov. 24, 2023, 9:40 p.m. | /u/ImGallo

Natural Language Processing www.reddit.com

Hey, I've been working on PDF data extraction for a while now. I usually rely on well-known Python libraries like PyPDF2, PyMUPDF, and TABULA. I've also had good results using Azure Cognitive Services in conjunction with regular expressions and the aforementioned libraries. While I haven't personally used them, I've seen that some models on HuggingFace are being used for PDF data extraction as well.
So, my question is: What are the most important or useful tools/techniques/models for extracting information from …

azure azure cognitive services cognitive data data extraction extraction good hey huggingface languagetechnology libraries pdf pymupdf python services them

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US