April 28, 2023, 5:58 a.m. | Toon Beerten

Towards Data Science - Medium towardsdatascience.com

Donut versus Pix2Struct on custom data

Image by author (with)
Donut and Pix2Struct are image-to-text models that combine the simplicity of pure pixel inputs with visual language understanding tasks. Simply put: an image goes in and extracted indexes come out as JSON.

Recently I released a Donut model finetuned on invoices. Ever so often I get the question how to train with a custom dataset. Also, a similar model was released: Pix2Struct, it claims to be significantly better. …

author data data extraction data science dataset donut extraction free image image-to-text json language language understanding machine learning ocr pixel simplicity text transformers understanding

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Healthcare Data Modeler/Data Architect - REMOTE

@ Perficient | United States

Data Analyst – Sustainability, Green IT

@ H&M Group | Stockholm, Sweden

RWE Data Analyst

@ Sanofi | Hyderabad

Machine Learning Engineer

@ JPMorgan Chase & Co. | Jersey City, NJ, United States