April 17, 2024, 4:46 a.m. | Robik Shrestha, Kushal Kafle, Christopher Kanan

cs.CL updates on arXiv.org arxiv.org

arXiv:2004.05704v3 Announce Type: replace-cross
Abstract: Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performance improvements are not a result of improved visual grounding, but a regularization effect which prevents over-fitting to …

abstract arxiv attention bias biases correlations cs.ai cs.cl cs.cv dataset exploit human issue maps question question answering statistical type visual visual cues vqa

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Intelligence Manager

@ Sanofi | Budapest

Principal Engineer, Data (Hybrid)

@ Homebase | Toronto, Ontario, Canada