all AI news
XDoc: Unified Pre-training for Cross-Format Document Understanding. (arXiv:2210.02849v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
The surge of pre-training has witnessed the rapid development of document
understanding recently. Pre-training and fine-tuning framework has been
effectively used to tackle texts in various formats, including plain texts,
document texts, and web texts. Despite achieving promising performance,
existing pre-trained models usually target one specific document format at one
time, making it difficult to combine knowledge from multiple document formats.
To address this, we propose XDoc, a unified pre-trained model which deals with
different document formats in a single …
arxiv document understanding format pre-training training understanding