Feb. 13, 2024, 5:43 a.m. | Simon Ging Mar\'ia A. Bravo Thomas Brox

cs.LG updates on arXiv.org arxiv.org

The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our research seeks to advance our understanding of these models' capabilities. We propose a novel VQA benchmark based on well-known visual classification datasets which allows a granular evaluation of text-generative vision-language models and their comparison with discriminative vision-language models. To improve the assessment of coarse answers on fine-grained classification tasks, we suggest …

advance benchmark benchmarking benchmarks capabilities classification cs.cl cs.cv cs.lg datasets endeavor evaluation generative language language models limitations novel question question answering research semantic text understanding vision vision-language models visual

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Data Science Advisor

@ Blue Yonder | Hamburg

Data Analyst

@ Sinch | São Paulo, State of São Paulo, Brazil - Remote

Data Engineer - Híbrido

@ SGS | Callao, Peru

Senior Analytics Engineer Brazil

@ Hiflylabs | Blumenau, Hungary