ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs | allainews.com

June 13, 2024, 4:46 a.m. | Irene Huang, Wei Lin, M. Jehanzeb Mirza, Jacob A. Hansen, Sivan Doveh, Victor Ion Butoi, Roei Herzig, Assaf Arbelle, Hilde Kuhene, Trevor Darrel, Chua

cs.CV updates on arXiv.org arxiv.org

arXiv:2406.08164v1 Announce Type: new
Abstract: Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in such reasoning tasks. This prompts a crucial question: have VLMs effectively tackled the CR challenge? We conjecture that existing CR benchmarks may not adequately push the boundaries of modern VLMs due to the reliance on an LLM-only negative text generation pipeline. Consequently, …

abstract arxiv attributes cs.cv decoder encoder evaluation grasping language language model language models large language large language model llm modern prompts question reasoning relations significance tasks type vision vision-language vision-language models visual vlms word

More from arxiv.org / cs.CV updates on arXiv.org

InstantGroup: Instant Template Generation for Scalable Group of Brain MRI Registration 21 hours ago | arxiv.org

abstract arxiv brain costs +15

Visual Odometry with Neuromorphic Resonator Networks 21 hours ago | arxiv.org

abstract arxiv cs.ai cs.cv +15

CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video 21 hours ago | arxiv.org

arxiv cs.cv dynamic neural radiance field +4

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models 21 hours ago | arxiv.org

arxiv cs.cv instruction-tuned language +6

Towards Training-free Open-world Segmentation via Image Prompt Foundation Models 21 hours ago | arxiv.org

abstract arxiv computer computer vision +33

Re-initialization-free Level Set Method via Molecular Beam Epitaxy Equation Regularization for Image Segmentation 21 hours ago | arxiv.org

abstract arxiv become continuity +15

ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer 21 hours ago | arxiv.org

arxiv cs.ai cs.cv cs.cy +9

Unsupervised Open-Vocabulary Object Localization in Videos 21 hours ago | arxiv.org

abstract advances arxiv attention +21

Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network 21 hours ago | arxiv.org

arxiv compensation cs.cv eess.iv +6

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Hybrid Cloud Engineer

@ Vanguard | Wayne, PA

View on ai-jobs.net

Senior Software Engineer

@ F5 | San Jose

View on ai-jobs.net

Software Engineer, Backend, 3+ Years of Experience

@ Snap Inc. | Bellevue - 110 110th Ave NE

View on ai-jobs.net

Global Head of Commercial Data Foundations

@ Sanofi | Cambridge

View on ai-jobs.net