Knowing Where and What: Unified Word Block Pretraining for Document Understanding. (arXiv:2207.13979v2 [cs.CL] UPDATED) | allainews.com

Aug. 1, 2022, 1:11 a.m. | Song Tao, Zijian Wang, Tiantian Fan, Canjie Luo, Can Huang

cs.CL updates on arXiv.org arxiv.org

Due to the complex layouts of documents, it is challenging to extract
information for documents. Most previous studies develop multimodal pre-trained
models in a self-supervised way. In this paper, we focus on the embedding
learning of word blocks containing text and layout information, and propose
UTel, a language model with Unified TExt and Layout pre-training. Specifically,
we propose two pre-training tasks: Surrounding Word Prediction (SWP) for the
layout learning, and Contrastive learning of Word Embeddings (CWE) for
identifying different word …

arxiv document understanding understanding

More from arxiv.org / cs.CL updates on arXiv.org

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis 2 hours ago | arxiv.org

abstract arxiv cs.cl cs.sd +14

LSTM-based Deep Neural Network With A Focus on Sentence Representation for Sequential Sentence Classification in … 2 hours ago | arxiv.org

abstract arxiv classification cs.cl +13

Improving Text Embeddings with Large Language Models 2 hours ago | arxiv.org

abstract arxiv cs.cl cs.ir +22

The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation 2 hours ago | arxiv.org

abstract arxiv behavior belief +22

When MOE Meets LLMs: Parameter Efficient Fine-tuning for Multi-task Medical Applications 2 hours ago | arxiv.org

abstract applications arxiv attention +19

TRAM: Benchmarking Temporal Reasoning for Large Language Models 2 hours ago | arxiv.org

abstract arxiv benchmarking benchmarks +17

Multi-hop Question Answering 2 hours ago | arxiv.org

abstract ai systems arxiv cs.ai +18

Towards a Fluid computer 2 hours ago | arxiv.org

abstract article arxiv computer +13

CWRCzech: 100M Query-Document Czech Click Dataset and Its Application to Web Relevance Ranking 2 hours ago | arxiv.org

application arxiv click cs.cl +8

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Technical Program Manager, Expert AI Trainer Acquisition & Engagement

@ OpenAI | San Francisco, CA

View on ai-jobs.net

Director, Data Engineering

@ PatientPoint | Cincinnati, Ohio, United States

View on ai-jobs.net