March 26, 2024, 4:48 a.m. | Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.16516v1 Announce Type: cross
Abstract: Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e.g., locations of texts and table-cells). To this end, we propose visually guided generative text-layout pre-training, named ViTLP. Given a document image, the model optimizes hierarchical language and layout modeling objectives to generate the interleaved text and layout sequence. In addition, to address the …

abstract arxiv boost cells cs.cl cs.cv document document understanding generative intelligence locations performance pre-training prior reason shows study table text training type understanding visual

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Global Data Architect, AVP - State Street Global Advisors

@ State Street | Boston, Massachusetts

Data Engineer

@ NTT DATA | Pune, MH, IN