KNVQA: A Benchmark for evaluation knowledge-based VQA | allainews.com

June 14, 2024, 4:48 a.m. | Sirui Cheng, Siyu Zhang, Jiayi Wu, Muchen Lan

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.12639v2 Announce Type: replace
Abstract: Within the multimodal field, large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems. However, LVLMs are still plagued by the two critical issues of object hallucination and factual accuracy, which limit the practicality of LVLMs in different scenarios. Furthermore, previous evaluation methods focus more on the comprehension and reasoning of language content but lack a comprehensive evaluation of multimodal interactions, thereby resulting …

abstract accuracy arxiv benchmark capabilities cs.ai cs.cv evaluation hallucination however knowledge language language models multimodal object perception progress reasoning replace systems type vision vision-language vision-language models visual vqa

More from arxiv.org / cs.CV updates on arXiv.org

DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object Segmentation 15 hours ago | arxiv.org

arxiv cs.cv eess.iv encoder +7

MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images 15 hours ago | arxiv.org

abstract art arxiv cs.cv +25

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases 15 hours ago | arxiv.org

abstract arxiv assessment attention +17

MISS: A Generative Pretraining and Finetuning Approach for Med-VQA 15 hours ago | arxiv.org

abstract application arxiv classification +23

ChartBench: A Benchmark for Complex Visual Reasoning in Charts 15 hours ago | arxiv.org

arxiv benchmark charts cs.cv +4

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model 15 hours ago | arxiv.org

arxiv cs.ai cs.cv designing +10

Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations 15 hours ago | arxiv.org

abstract analysis arxiv behavior +19

VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation 15 hours ago | arxiv.org

arxiv cs.cv graph language +5

High-Resolution Building and Road Detection from Sentinel-2 15 hours ago | arxiv.org

abstract arxiv building buildings +15

Senior Data Engineer

@ Displate | Warsaw

View on ai-jobs.net

Senior Principal Software Engineer

@ Oracle | Columbia, MD, United States

View on ai-jobs.net

Software Engineer for Manta Systems

@ PXGEO | Linköping, Östergötland County, Sweden

View on ai-jobs.net

DevOps Engineer

@ Teradyne | Odense, DK

View on ai-jobs.net

LIDAR System Engineer Trainee

@ Valeo | PRAGUE - PRA2

View on ai-jobs.net

Business Applications Administrator

@ Allegro | Poznań, Poland

View on ai-jobs.net