all AI news
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination
March 22, 2024, 4:45 a.m. | Dingchen Yang, Bowen Cao, Guang Chen, Changjun Jiang
cs.CV updates on arXiv.org arxiv.org
Abstract: Multi-modal Large Language Models (MLLMs) demonstrate remarkable success across various vision-language tasks. However, they suffer from visual hallucination, where the generated responses diverge from the provided image. Are MLLMs completely oblivious to accurate visual cues when they hallucinate? Our investigation reveals that the visual branch may simultaneously advocate both accurate and non-existent content. To address this issue, we propose Pensieve, a training-free method inspired by our observation that analogous visual hallucinations can arise among images …
abstract arxiv cs.cv generated hallucination however image investigation language language models large language large language models mllms modal multi-modal responses success tasks type vision visual visual cues
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Modeler
@ Sherwin-Williams | Cleveland, OH, United States