all AI news
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
April 23, 2024, 4:48 a.m. | Haoyi Qiu, Wenbo Hu, Zi-Yi Dou, Nanyun Peng
cs.CV updates on arXiv.org arxiv.org
Abstract: Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs, undermining their reliability. A comprehensive quantitative evaluation is necessary to identify and understand the extent of hallucinations in these models. However, existing benchmarks are often limited in scope, focusing mainly on object hallucinations. Furthermore, current evaluation methods struggle to effectively address the subtle semantic distinctions between model outputs and reference data, as well as the balance between hallucination …
abstract arxiv benchmarks coverage cs.cl cs.cv evaluation generate hallucination hallucinations however identify language language models quantitative reliability type vision vision-language vision-language models
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US