Feb. 13, 2024, 5:43 a.m. | Simon Ging Mar\'ia A. Bravo Thomas Brox

cs.LG updates on arXiv.org arxiv.org

The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our research seeks to advance our understanding of these models' capabilities. We propose a novel VQA benchmark based on well-known visual classification datasets which allows a granular evaluation of text-generative vision-language models and their comparison with discriminative vision-language models. To improve the assessment of coarse answers on fine-grained classification tasks, we suggest …

advance benchmark benchmarking benchmarks capabilities classification cs.cl cs.cv cs.lg datasets endeavor evaluation generative language language models limitations novel question question answering research semantic text understanding vision vision-language models visual

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote