Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination | allainews.com

March 22, 2024, 4:45 a.m. | Dingchen Yang, Bowen Cao, Guang Chen, Changjun Jiang

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.14401v1 Announce Type: new
Abstract: Multi-modal Large Language Models (MLLMs) demonstrate remarkable success across various vision-language tasks. However, they suffer from visual hallucination, where the generated responses diverge from the provided image. Are MLLMs completely oblivious to accurate visual cues when they hallucinate? Our investigation reveals that the visual branch may simultaneously advocate both accurate and non-existent content. To address this issue, we propose Pensieve, a training-free method inspired by our observation that analogous visual hallucinations can arise among images …

abstract arxiv cs.cv generated hallucination however image investigation language language models large language large language models mllms modal multi-modal responses success tasks type vision visual visual cues

More from arxiv.org / cs.CV updates on arXiv.org

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts 22 hours ago | arxiv.org

abstract arxiv building construction +18

Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation 22 hours ago | arxiv.org

abstract applications arxiv automation +15

Morphing Tokens Draw Strong Masked Image Models 22 hours ago | arxiv.org

arxiv cs.cv image tokens +1

Compact 3D Scene Representation via Self-Organizing Gaussian Grids 22 hours ago | arxiv.org

arxiv compact cs.cv representation +2

Fingerprint Matching with Localized Deep Representation 22 hours ago | arxiv.org

abstract accuracy acquisition arxiv +8

A Survey on Transferability of Adversarial Examples across Deep Neural Networks 22 hours ago | arxiv.org

abstract adversarial adversarial examples arxiv +27

Content Bias in Deep Learning Image Age Approximation: A new Approach Towards better Explainability 22 hours ago | arxiv.org

abstract age approximation arxiv +15

Continual Action Assessment via Task-Consistent Score-Discriminative Feature Distribution Modeling 22 hours ago | arxiv.org

arxiv assessment consistent continual +6

DA-RAW: Domain Adaptive Object Detection for Real-World Adverse Weather Conditions 22 hours ago | arxiv.org

abstract arxiv cs.cv cs.ro +17

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States

View on ai-jobs.net