Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations? | allainews.com

April 30, 2024, 4:43 a.m. | Letitia Parcalabescu, Anette Frank

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.18624v1 Announce Type: cross
Abstract: Vision and language models (VLMs) are currently the most generally performant architectures on multimodal tasks. Next to their predictions, they can also produce explanations, either in post-hoc or CoT settings. However, it is not clear how much they use the vision and text modalities when generating predictions or explanations. In this work, we investigate if VLMs rely on modalities differently when generating explanations as opposed to when they provide answers. We also evaluate the self-consistency …

abstract architectures arxiv clear consistent cs.ai cs.cl cs.cv cs.lg however images language language models multimodal next predictions tasks text type vision vlms

More from arxiv.org / cs.LG updates on arXiv.org

Red-Teaming for Generative AI: Silver Bullet or Security Theater? 53 minutes ago | arxiv.org

abstract arxiv concerns cs.cy +15

Efficient Data-Driven MPC for Demand Response of Commercial Buildings 53 minutes ago | arxiv.org

abstract arxiv buildings commercial +20

BrepGen: A B-rep Generative Diffusion Model with Structured Latent Geometry 53 minutes ago | arxiv.org

arxiv cs.cv cs.lg diffusion +5

Data-Driven Physics-Informed Neural Networks: A Digital Twin Perspective 53 minutes ago | arxiv.org

abstract arxiv automated construction +26

Testing the Segment Anything Model on radiology data 53 minutes ago | arxiv.org

abstract applications arxiv become +20

Robust Point Matching with Distance Profiles 53 minutes ago | arxiv.org

abstract analyze arxiv cs.lg +13

Cell Maps Representation For Lung Adenocarcinoma Growth Patterns Classification In Whole Slide Images 53 minutes ago | arxiv.org

abstract arxiv behavior classification +18

Improved Baselines with Visual Instruction Tuning 53 minutes ago | arxiv.org

abstract academic arxiv clip +25

Calorimeter shower superresolution 53 minutes ago | arxiv.org

abstract arxiv challenge computational +16

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net