MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception | allainews.com

Feb. 27, 2024, 5:48 a.m. | Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2401.07529v2 Announce Type: replace
Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in visual perception and understanding. However, these models also suffer from hallucinations, which limit their reliability as AI systems. We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception. Despite its importance, this aspect of MLLMs has been overlooked in prior studies. …

arxiv benchmark cs.cl cs.cv language language models large language large language models multimodal perception sap self-awareness type

More from arxiv.org / cs.CV updates on arXiv.org

Neural Bounding 1 day, 6 hours ago | arxiv.org

arxiv cs.cv cs.gr replace +1

Shape of my heart: Cardiac models through learned signed distance functions 1 day, 6 hours ago | arxiv.org

abstract advanced arxiv challenges +18

Spatial and Modal Optimal Transport for Fast Cross-Modal MRI Reconstruction 1 day, 6 hours ago | arxiv.org

abstract analysis arxiv clinical +21

Learning Keypoints for Robotic Cloth Manipulation using Synthetic Data 1 day, 6 hours ago | arxiv.org

abstract arxiv clothes cs.cv +14

Pixel-Level Change Detection Pseudo-Label Learning for Remote Sensing Change Captioning 1 day, 6 hours ago | arxiv.org

arxiv captioning change cs.cv +5

ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation 1 day, 6 hours ago | arxiv.org

arxiv augmentation cs.cv diffusion +3

CLIP in Medical Imaging: A Comprehensive Survey 1 day, 6 hours ago | arxiv.org

arxiv clip cs.cv imaging +5

SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization 1 day, 6 hours ago | arxiv.org

abstract alignment applications arxiv +15

Salient Object Detection in RGB-D Videos 1 day, 6 hours ago | arxiv.org

arxiv cs.cv detection object +4

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Senior Associate, Data and Analytics

@ Publicis Groupe | New York City, United States

View on ai-jobs.net