all AI news
Rethinking Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization. (arXiv:2205.12191v1 [cs.CL])
cs.CV updates on arXiv.org arxiv.org
Vision-and-language (V&L) models pretrained on large-scale multimodal data
have demonstrated strong performance on various tasks such as image captioning
and visual question answering (VQA). The quality of such models is commonly
assessed by measuring their performance on unseen data that typically comes
from the same distribution as the training data. However, we observe that these
models exhibit poor out-of-distribution (OOD) generalization on the task of
VQA. To better understand the underlying causes of poor generalization, we
comprehensively investigate performance of …
arxiv case case study distribution evaluation practices question answering study