all AI news
KTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA. (arXiv:2108.09717v6 [cs.CV] UPDATED)
Web: http://arxiv.org/abs/2108.09717
Jan. 31, 2022, 2:10 a.m. | Arka Ujjal Dey, Ernest Valveny, Gaurav Harit
cs.CV updates on arXiv.org arxiv.org
The open-ended question answering task of Text-VQA requires reading and
reasoning about local, often previously unseen, scene-text content of an image.
We address this zero-shot nature of the problem by proposing the generalized
use of external knowledge to augment our understanding of the said scene-text.
We design a framework to extract, validate, and reason with knowledge using a
standard multimodal transformer for vision language understanding tasks.
Through empirical evidence and qualitative results, we demonstrate how external
knowledge can highlight instance-only …
More from arxiv.org / cs.CV updates on arXiv.org
Latest AI/ML/Big Data Jobs
Director, Data Engineering and Architecture
@ Chainalysis | California | New York | Washington DC | Remote - USA
Deep Learning Researcher
@ Topaz Labs | Dallas, TX
Sr Data Engineer (Contractor)
@ SADA | US - West
Senior Cloud Database Administrator
@ Findhelp | Remote
Senior Data Analyst
@ System1 | Remote
Speech Machine Learning Research Engineer
@ Samsung Research America | Mountain View, CA