all AI news
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
March 26, 2024, 4:48 a.m. | Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong
cs.CV updates on arXiv.org arxiv.org
Abstract: Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e.g., locations of texts and table-cells). To this end, we propose visually guided generative text-layout pre-training, named ViTLP. Given a document image, the model optimizes hierarchical language and layout modeling objectives to generate the interleaved text and layout sequence. In addition, to address the …
abstract arxiv boost cells cs.cl cs.cv document document understanding generative intelligence locations performance pre-training prior reason shows study table text training type understanding visual
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
1 day, 13 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Global Data Architect, AVP - State Street Global Advisors
@ State Street | Boston, Massachusetts
Data Engineer
@ NTT DATA | Pune, MH, IN