all AI news
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks. (arXiv:2311.01361v1 [cs.CV])
cs.CL updates on arXiv.org arxiv.org
Automatically evaluating vision-language tasks is challenging, especially
when it comes to reflecting human judgments due to limitations in accounting
for fine-grained details. Although GPT-4V has shown promising results in
various multi-modal tasks, leveraging GPT-4V as a generalist evaluator for
these tasks has not yet been systematically explored. We comprehensively
validate GPT-4V's capabilities for evaluation purposes, addressing tasks
ranging from foundational image-to-text and text-to-image synthesis to
high-level image-to-image translations and multi-images to text alignment. We
employ two evaluation methods, single-answer grading …
accounting arxiv fine-grained gpt gpt-4v human language limitations multi-modal tasks vision