TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. (arXiv:2109.10282v4 [cs.CL] UPDATED) | allainews.com

Aug. 18, 2022, 1:12 a.m. | Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei

cs.CV updates on arXiv.org arxiv.org

Text recognition is a long-standing research problem for document
digitalization. Existing approaches are usually built based on CNN for image
understanding and RNN for char-level text generation. In addition, another
language model is usually needed to improve the overall accuracy as a
post-processing step. In this paper, we propose an end-to-end text recognition
approach with pre-trained image Transformer and text Transformer models, namely
TrOCR, which leverages the Transformer architecture for both image
understanding and wordpiece-level text generation. The TrOCR model …

arxiv character recognition optical character recognition pre-trained models transformer

More from arxiv.org / cs.CV updates on arXiv.org

AV-RIR: Audio-Visual Room Impulse Response Estimation 8 hours ago | arxiv.org

arxiv audio cs.cv cs.sd +3

A Hierarchical Architecture for Neural Materials 8 hours ago | arxiv.org

abstract architecture arxiv cs.cv +8

SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation 8 hours ago | arxiv.org

arxiv cs.cv image medical +3

NU-Class Net: A Novel Deep Learning-based Approach for Video Quality Enhancement 8 hours ago | arxiv.org

abstract arxiv class compression +18

Mosaic-SDF for 3D Generative Models 8 hours ago | arxiv.org

2d image abstract arxiv cs.cv +14

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection 8 hours ago | arxiv.org

3d object 3d object detection arxiv cs.cv +6

A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly Detection 8 hours ago | arxiv.org

anomaly anomaly detection arxiv behavior +7

ChatPose: Chatting about 3D Human Pose 8 hours ago | arxiv.org

abstract arxiv cs.cv framework +14

Boosting Audio-visual Zero-shot Learning with Large Language Models 8 hours ago | arxiv.org

arxiv audio boosting cs.cv +7

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Management Assistant

@ World Vision | Amman Office, Jordan

View on ai-jobs.net

Cloud Data Engineer, Global Services Delivery, Google Cloud

@ Google | Buenos Aires, Argentina

View on ai-jobs.net