April 28, 2023, 5:58 a.m. | Toon Beerten

Towards Data Science - Medium towardsdatascience.com

Donut versus Pix2Struct on custom data

Image by author (with)
Donut and Pix2Struct are image-to-text models that combine the simplicity of pure pixel inputs with visual language understanding tasks. Simply put: an image goes in and extracted indexes come out as JSON.

Recently I released a Donut model finetuned on invoices. Ever so often I get the question how to train with a custom dataset. Also, a similar model was released: Pix2Struct, it claims to be significantly better. …

author data data extraction data science dataset donut extraction free image image-to-text json language language understanding machine learning ocr pixel simplicity text transformers understanding

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US