all AI news
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
April 22, 2024, 4:45 a.m. | Yihao Ding, Kaixuan Ren, Jiabin Huang, Siwen Luo, Soyeon Caren Han
cs.CV updates on arXiv.org arxiv.org
Abstract: Document Question Answering (QA) presents a challenge in understanding visually-rich documents (VRD), particularly those dominated by lengthy textual content like research journal articles. Existing studies primarily focus on real-world documents with sparse text, while challenges persist in comprehending the hierarchical semantic relations among multiple pages to locate multimodal components. To address this gap, we propose PDF-MVQA, which is tailored for research journal articles, encompassing multiple pages and multimodal information retrieval. Unlike traditional machine reading comprehension …
abstract articles arxiv challenge challenges cs.cl cs.cv dataset document documents focus hierarchical information journal multimodal pdf question question answering relations research retrieval semantic studies text textual type understanding visual world
More from arxiv.org / cs.CV updates on arXiv.org
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
1 day, 14 hours ago |
arxiv.org
Fingerprint Matching with Localized Deep Representation
1 day, 14 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne