May 25, 2022, 1:13 a.m. | Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh

cs.CV updates on arXiv.org arxiv.org

Vision-and-language (V&L) models pretrained on large-scale multimodal data
have demonstrated strong performance on various tasks such as image captioning
and visual question answering (VQA). The quality of such models is commonly
assessed by measuring their performance on unseen data that typically comes
from the same distribution as the training data. However, we observe that these
models exhibit poor out-of-distribution (OOD) generalization on the task of
VQA. To better understand the underlying causes of poor generalization, we
comprehensively investigate performance of …

arxiv case case study distribution evaluation practices question answering study

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst (CPS-GfK)

@ GfK | Bucharest

Consultant Data Analytics IT Digital Impulse - H/F

@ Talan | Paris, France

Data Analyst

@ Experian | Mumbai, India

Data Scientist

@ Novo Nordisk | Princeton, NJ, US

Data Architect IV

@ Millennium Corporation | United States