A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models | allainews.com

Feb. 29, 2024, 5:45 a.m. | Xiujie Song, Mengyue Wu, Kenny Q. Zhu, Chunhao Zhang, Yanyi Chen

cs.CV updates on arXiv.org arxiv.org

arXiv:2402.18409v1 Announce Type: cross
Abstract: Large Vision Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities. Inspired by the prevalent use of the "Cookie Theft" task in human cognition test, we propose a novel evaluation benchmark to evaluate high-level cognitive ability of LVLMs using images with rich semantics. It defines eight reasoning capabilities and consists of an image description task and a visual question answering task. Our evaluation on well-known LVLMs shows that there …

abstract arxiv benchmark cognition cognitive cookie cs.ai cs.cl cs.cv evaluation human image language language models novel reasoning success test theft type vision

More from arxiv.org / cs.CV updates on arXiv.org

Demonstration of an Adversarial Attack Against a Multimodal Vision Language Model for Pathology Imaging 11 hours ago | arxiv.org

adversarial arxiv cs.cv eess.iv +9

Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution 11 hours ago | arxiv.org

arxiv cs.cv eess.iv image +3

Swift Parameter-free Attention Network for Efficient Super-Resolution 11 hours ago | arxiv.org

arxiv attention cs.cv eess.iv +5

Generative Multimodal Models are In-Context Learners 11 hours ago | arxiv.org

abstract arxiv capabilities context +16

HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation 11 hours ago | arxiv.org

abstract arxiv call controlnet +11

WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields 11 hours ago | arxiv.org

arxiv compact cs.cv cs.gr +6

A Survey of Emerging Applications of Diffusion Probabilistic Models in MRI 11 hours ago | arxiv.org

abstract applications arxiv computational +11

Utilizing dataset affinity prediction in object detection to assess training data 11 hours ago | arxiv.org

abstract advantages arxiv bias +16

Integrating View Conditions for Image Synthesis 11 hours ago | arxiv.org

abstract arxiv challenge control +17

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net