all AI news
Multimodal Pre-training Based on Graph Attention Network for Document Understanding. (arXiv:2203.13530v2 [cs.CV] UPDATED)
cs.CV updates on arXiv.org arxiv.org
Document intelligence as a relatively new research topic supports many
business applications. Its main task is to automatically read, understand, and
analyze documents. However, due to the diversity of formats (invoices, reports,
forms, etc.) and layouts in documents, it is difficult to make machines
understand documents. In this paper, we present the GraphDoc, a multimodal
graph attention-based model for various document understanding tasks. GraphDoc
is pre-trained in a multimodal framework by utilizing text, layout, and image
information simultaneously. In a …
arxiv attention document understanding graph multimodal network pre-training training understanding