Oct. 25, 2022, 1:16 a.m. | Zhenrong Zhang, Jiefeng Ma, Jun Du, Licheng Wang, Jianshu Zhang

cs.CV updates on arXiv.org arxiv.org

Document intelligence as a relatively new research topic supports many
business applications. Its main task is to automatically read, understand, and
analyze documents. However, due to the diversity of formats (invoices, reports,
forms, etc.) and layouts in documents, it is difficult to make machines
understand documents. In this paper, we present the GraphDoc, a multimodal
graph attention-based model for various document understanding tasks. GraphDoc
is pre-trained in a multimodal framework by utilizing text, layout, and image
information simultaneously. In a …

arxiv attention document understanding graph multimodal network pre-training training understanding

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Consultant Senior Power BI & Azure - CDI - H/F

@ Talan | Lyon, France