all AI news
KNVQA: A Benchmark for evaluation knowledge-based VQA
June 14, 2024, 4:48 a.m. | Sirui Cheng, Siyu Zhang, Jiayi Wu, Muchen Lan
cs.CV updates on arXiv.org arxiv.org
Abstract: Within the multimodal field, large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems. However, LVLMs are still plagued by the two critical issues of object hallucination and factual accuracy, which limit the practicality of LVLMs in different scenarios. Furthermore, previous evaluation methods focus more on the comprehension and reasoning of language content but lack a comprehensive evaluation of multimodal interactions, thereby resulting …
abstract accuracy arxiv benchmark capabilities cs.ai cs.cv evaluation hallucination however knowledge language language models multimodal object perception progress reasoning replace systems type vision vision-language vision-language models visual vqa
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Data Engineer
@ Displate | Warsaw
Senior Principal Software Engineer
@ Oracle | Columbia, MD, United States
Software Engineer for Manta Systems
@ PXGEO | Linköping, Östergötland County, Sweden
DevOps Engineer
@ Teradyne | Odense, DK
LIDAR System Engineer Trainee
@ Valeo | PRAGUE - PRA2
Business Applications Administrator
@ Allegro | Poznań, Poland