Web: http://arxiv.org/abs/2205.01883

May 5, 2022, 1:10 a.m. | Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut

cs.CV updates on arXiv.org arxiv.org

Visual Question Answering (VQA) has benefited from increasingly sophisticated
models, but has not enjoyed the same level of engagement in terms of data
creation. In this paper, we propose a method that automatically derives VQA
examples at volume, by leveraging the abundance of existing image-caption
annotations combined with neural models for textual question generation. We
show that the resulting data is of high-quality. VQA models trained on our data
improve state-of-the-art zero-shot accuracy by double digits and achieve a
level …

arxiv cv image

More from arxiv.org / cs.CV updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California