June 14, 2024, 4:48 a.m. | Sirui Cheng, Siyu Zhang, Jiayi Wu, Muchen Lan

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.12639v2 Announce Type: replace
Abstract: Within the multimodal field, large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems. However, LVLMs are still plagued by the two critical issues of object hallucination and factual accuracy, which limit the practicality of LVLMs in different scenarios. Furthermore, previous evaluation methods focus more on the comprehension and reasoning of language content but lack a comprehensive evaluation of multimodal interactions, thereby resulting …

abstract accuracy arxiv benchmark capabilities cs.ai cs.cv evaluation hallucination however knowledge language language models multimodal object perception progress reasoning replace systems type vision vision-language vision-language models visual vqa

Senior Data Engineer

@ Displate | Warsaw

Senior Principal Software Engineer

@ Oracle | Columbia, MD, United States

Software Engineer for Manta Systems

@ PXGEO | Linköping, Östergötland County, Sweden

DevOps Engineer

@ Teradyne | Odense, DK

LIDAR System Engineer Trainee

@ Valeo | PRAGUE - PRA2

Business Applications Administrator

@ Allegro | Poznań, Poland