all AI news
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
Feb. 27, 2024, 5:47 a.m. | Yi Zong, Xipeng Qiu
cs.CV updates on arXiv.org arxiv.org
Abstract: The Large Vision-Language Models (LVLMs) have demonstrated great abilities in image perception and language understanding. However, existing multimodal benchmarks focus on primary perception abilities and commonsense knowledge which are insufficient to reflect the comprehensive capabilities of LVLMs. We propose GAOKAO-MM, a multimodal benchmark based on the Chinese College Entrance Examination (GAOKAO), comprising of 8 subjects and 12 types of images, such as diagrams, function graphs, maps and photos. GAOKAO-MM derives from native Chinese context and …
abstract arxiv benchmark benchmarks capabilities chinese cs.ai cs.cl cs.cv evaluation focus human image knowledge language language models language understanding multimodal multimodal models perception type understanding vision vision-language models
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote