Visual Grounding Methods for VQA are Working for the Wrong Reasons! | allainews.com

April 17, 2024, 4:46 a.m. | Robik Shrestha, Kushal Kafle, Christopher Kanan

cs.CL updates on arXiv.org arxiv.org

arXiv:2004.05704v3 Announce Type: replace-cross
Abstract: Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performance improvements are not a result of improved visual grounding, but a regularization effect which prevents over-fitting to …

abstract arxiv attention bias biases correlations cs.ai cs.cl cs.cv dataset exploit human issue maps question question answering statistical type visual visual cues vqa

More from arxiv.org / cs.CL updates on arXiv.org

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation 11 hours ago | arxiv.org

abstract arxiv asr audio +22

Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment 11 hours ago | arxiv.org

abstract accuracy arxiv continuous +17

MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria 11 hours ago | arxiv.org

arxiv cs.cl llms mllm +5

The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics 11 hours ago | arxiv.org

abstract arxiv challenges computational +18

HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation 11 hours ago | arxiv.org

abstract apis arxiv costs +22

Prompt have evil twins 11 hours ago | arxiv.org

abstract arxiv behavior call +9

Reconstructing Materials Tetrahedron: Challenges in Materials Information Extraction 11 hours ago | arxiv.org

abstract arxiv challenges cond-mat.mtrl-sci +16

SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition 11 hours ago | arxiv.org

abstract arxiv asr attention +19

An Interactive Framework for Profiling News Media Sources 11 hours ago | arxiv.org

abstract arxiv cs.cl fake +10

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Business Intelligence Manager

@ Sanofi | Budapest

View on ai-jobs.net

Principal Engineer, Data (Hybrid)

@ Homebase | Toronto, Ontario, Canada

View on ai-jobs.net