When and why vision-language models behave like bags-of-words, and what to do about it?. (arXiv:2210.01936v2 [cs.CV] UPDATED) | allainews.com

Oct. 7, 2022, 1:16 a.m. | Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, James Zou

cs.CV updates on arXiv.org arxiv.org

Despite the success of large vision and language models (VLMs) in many
downstream applications, it is unclear how well they encode compositional
information. Here, we create the Attribution, Relation, and Order (ARO)
benchmark to systematically evaluate the ability of VLMs to understand
different types of relationships, attributes, and order. ARO consists of Visual
Genome Attribution, to test the understanding of objects' properties; Visual
Genome Relation, to test for relational understanding; and COCO &
Flickr30k-Order, to test for order sensitivity. ARO …

arxiv language language models vision words

More from arxiv.org / cs.CV updates on arXiv.org

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM 2 hours ago | arxiv.org

arxiv benchmark cs.cv eess.iv +5

SE(3)-Equivariant and Noise-Invariant 3D Rigid Motion Tracking in Brain MRI 2 hours ago | arxiv.org

arxiv brain cs.cv eess.iv +4

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation 2 hours ago | arxiv.org

arxiv box creative cs.ai +10

Spiking Structured State Space Model for Monaural Speech Enhancement 2 hours ago | arxiv.org

abstract arxiv challenges computational +17

Improved cryo-EM Pose Estimation and 3D Classification through Latent-Space Disentanglement 2 hours ago | arxiv.org

abstract arxiv challenges classification +18

Multilevel Geometric Optimization for Regularised Constrained Linear Inverse Problems 2 hours ago | arxiv.org

abstract arxiv box compute +7

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models 2 hours ago | arxiv.org

abstract arxiv capability consistent +18

A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving 2 hours ago | arxiv.org

arxiv autonomous autonomous driving cs.cv +4

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces 2 hours ago | arxiv.org

abstract arxiv cs.cr cs.cv +10

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Engineer

@ Contact Government Services | Trenton, NJ

View on ai-jobs.net

Data Engineer

@ Comply365 | Bristol, UK

View on ai-jobs.net

Masterarbeit: Deep learning-basierte Fehler Detektion bei Montageaufgaben

@ Fraunhofer-Gesellschaft | Karlsruhe, DE, 76131

View on ai-jobs.net

Assistant Manager ETL testing 1

@ KPMG India | Bengaluru, Karnataka, India

View on ai-jobs.net