all AI news
Optimized Table Tokenization for Table Structure Recognition. (arXiv:2305.03393v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
Extracting tables from documents is a crucial task in any document conversion
pipeline. Recently, transformer-based models have demonstrated that
table-structure can be recognized with impressive accuracy using
Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table,
such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent
the structure of the table. Since the token representation of the table
structure has a significant impact on the accuracy and run-time performance of
any Im2Seq model, we investigate in …
accuracy arxiv conversion documents html image latex pipeline recognition table tables tokenization tokens transformer