Feb. 17, 2024, 11:26 a.m. | /u/4AVcnE

Machine Learning www.reddit.com

Hello,

I'm working for a German insurance company looking to automate the extraction of data from customer invoices received as PDFs. We're particularly interested in details like invoice numbers, date, names, addresses, and line items with prices, aiming to output this information as JSON for further processing. These entities may appear multiple times or not at all.

We've tried several methods without success:

* **GPT-4 and various models**: Didn't consistently provide structured JSON output.
* **Impira/LayoutLM for invoices**: Struggled with …

advice automate automated customer data extraction german hello information insurance invoice json line machinelearning ner numbers pdfs processing

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist

@ Meta | Menlo Park, CA

Principal Data Scientist

@ Mastercard | O'Fallon, Missouri (Main Campus)