MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception | allainews.com

Feb. 27, 2024, 5:48 a.m. | Yuhao Wang, Yusheng Liao, Heyang Liu, Hongcheng Liu, Yu Wang, Yanfeng Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2401.07529v2 Announce Type: replace
Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in visual perception and understanding. However, these models also suffer from hallucinations, which limit their reliability as AI systems. We believe that these hallucinations are partially due to the models' struggle with understanding what they can and cannot perceive from images, a capability we refer to as self-awareness in perception. Despite its importance, this aspect of MLLMs has been overlooked in prior studies. …

arxiv benchmark cs.cl cs.cv language language models large language large language models multimodal perception sap self-awareness type

More from arxiv.org / cs.CV updates on arXiv.org

DIAS: A Dataset and Benchmark for Intracranial Artery Segmentation in DSA sequences 2 days, 2 hours ago | arxiv.org

arxiv benchmark cs.cv dataset +6

Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images 2 days, 2 hours ago | arxiv.org

abstract arxiv benchmarking biases +20

MAFA: Managing False Negatives for Vision-Language Pre-training 2 days, 2 hours ago | arxiv.org

arxiv cs.ai cs.cv false +7

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation 2 days, 2 hours ago | arxiv.org

abstract animate anyone animation arxiv +23

KNVQA: A Benchmark for evaluation knowledge-based VQA 2 days, 2 hours ago | arxiv.org

abstract accuracy arxiv benchmark +22

Optimization Efficient Open-World Visual Region Recognition 2 days, 2 hours ago | arxiv.org

abstract arxiv building capabilities +25

HyperFields: Towards Zero-Shot Generation of NeRFs from Text 2 days, 2 hours ago | arxiv.org

abstract arxiv cs.cv distillation +14

Multi-modal Learning with Missing Modality via Shared-Specific Feature Modelling 2 days, 2 hours ago | arxiv.org

arxiv cs.cv feature modal +5

A Generative Model for Digital Camera Noise Synthesis 2 days, 2 hours ago | arxiv.org

abstract arxiv cs.cv digital +14

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Associate Director, Technology & Data Lead - Remote

@ Novartis | East Hanover

View on ai-jobs.net

Product Manager, Generative AI

@ Adobe | San Jose

View on ai-jobs.net

Associate Director – Data Architect Corporate Functions

@ Novartis | Prague

View on ai-jobs.net

Principal Data Scientist

@ Salesforce | California - San Francisco

View on ai-jobs.net

Senior Analyst Data Science

@ Novartis | Hyderabad (Office)

View on ai-jobs.net