Feb. 27, 2024, 5:48 a.m. | Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2401.07529v2 Announce Type: replace
Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in visual perception and understanding. However, these models also suffer from hallucinations, which limit their reliability as AI systems. We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception. Despite its importance, this aspect of MLLMs has been overlooked in prior studies. …

arxiv benchmark cs.cl cs.cv language language models large language large language models multimodal perception sap self-awareness type

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Aumni - Site Reliability Engineer III - MLOPS

@ JPMorgan Chase & Co. | Salt Lake City, UT, United States

Senior Data Analyst

@ Teya | Budapest, Hungary

Technical Analyst (Data Analytics)

@ Contact Government Services | Chicago, IL

Engineer, AI/Machine Learning

@ Masimo | Irvine, CA, United States

Private Bank - Executive Director: Data Science and Client / Business Intelligence

@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India