Web: http://arxiv.org/abs/2108.09717

Jan. 31, 2022, 2:10 a.m. | Arka Ujjal Dey, Ernest Valveny, Gaurav Harit

cs.CV updates on arXiv.org arxiv.org

The open-ended question answering task of Text-VQA requires reading and
reasoning about local, often previously unseen, scene-text content of an image.
We address this zero-shot nature of the problem by proposing the generalized
use of external knowledge to augment our understanding of the said scene-text.
We design a framework to extract, validate, and reason with knowledge using a
standard multimodal transformer for vision language understanding tasks.
Through empirical evidence and qualitative results, we demonstrate how external
knowledge can highlight instance-only …

arxiv cv knowledge text

More from arxiv.org / cs.CV updates on arXiv.org

Director, Data Engineering and Architecture

@ Chainalysis | California | New York | Washington DC | Remote - USA

Deep Learning Researcher

@ Topaz Labs | Dallas, TX

Sr Data Engineer (Contractor)

@ SADA | US - West

Senior Cloud Database Administrator

@ Findhelp | Remote

Senior Data Analyst

@ System1 | Remote

Speech Machine Learning Research Engineer

@ Samsung Research America | Mountain View, CA