all AI news
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
March 20, 2024, 4:46 a.m. | Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou
cs.CV updates on arXiv.org arxiv.org
Abstract: Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for Visual Document Understanding are equipped with text recognition ability but lack general structure understanding abilities for text-rich document images. In this work, we emphasize the importance of structure information in Visual Document Understanding and propose the Unified Structure Learning to boost the performance of MLLMs. Our Unified Structure Learning comprises structure-aware …
arxiv cs.cv document document understanding free ocr type understanding
More from arxiv.org / cs.CV updates on arXiv.org
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
2 days, 18 hours ago |
arxiv.org
Fingerprint Matching with Localized Deep Representation
2 days, 18 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne