Feb. 27, 2024, 5:48 a.m. | Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2401.07529v2 Announce Type: replace
Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in visual perception and understanding. However, these models also suffer from hallucinations, which limit their reliability as AI systems. We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception. Despite its importance, this aspect of MLLMs has been overlooked in prior studies. …

arxiv benchmark cs.cl cs.cv language language models large language large language models multimodal perception sap self-awareness type

Senior Data Engineer

@ Displate | Warsaw

Associate Director, Technology & Data Lead - Remote

@ Novartis | East Hanover

Product Manager, Generative AI

@ Adobe | San Jose

Associate Director – Data Architect Corporate Functions

@ Novartis | Prague

Principal Data Scientist

@ Salesforce | California - San Francisco

Senior Analyst Data Science

@ Novartis | Hyderabad (Office)