May 8, 2023, 12:46 a.m. | Maksym Lysak, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Peter Staar

cs.CV updates on arXiv.org arxiv.org

Extracting tables from documents is a crucial task in any document conversion
pipeline. Recently, transformer-based models have demonstrated that
table-structure can be recognized with impressive accuracy using
Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table,
such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent
the structure of the table. Since the token representation of the table
structure has a significant impact on the accuracy and run-time performance of
any Im2Seq model, we investigate in …

accuracy arxiv conversion documents html image latex pipeline recognition table tables tokenization tokens transformer

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US