Jan. 17, 2022, 2:10 a.m. | Feng Gao, Qing Ping, Govind Thattai, Aishwarya Reganti, Ying Nian Wu, Prem Natarajan

cs.CL updates on arXiv.org arxiv.org

Outside-knowledge visual question answering (OK-VQA) requires the agent to
comprehend the image, make use of relevant knowledge from the entire web, and
digest all the information to answer the question. Most previous works address
the problem by first fusing the image and question in the multi-modal space,
which is inflexible for further fusion with a vast amount of external
knowledge. In this paper, we call for a paradigm shift for the OK-VQA task,
which transforms the image into plain text, …

arxiv cv language natural natural language words

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Business Intelligence Developer / Analyst

@ Transamerica | Work From Home, USA

Data Analyst (All Levels)

@ Noblis | Bethesda, MD, United States