all AI news
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
Feb. 29, 2024, 5:45 a.m. | Koki Maeda, Shuhei Kurita, Taiki Miyanishi, Naoaki Okazaki
cs.CV updates on arXiv.org arxiv.org
Abstract: Given the accelerating progress of vision and language modeling, accurate evaluation of machine-generated image captions remains critical. In order to evaluate captions more closely to human preferences, metrics need to discriminate between captions of varying quality and content. However, conventional metrics fail short of comparing beyond superficial matches of words or embedding similarities; thus, they still need improvement. This paper presents VisCE$^2$, a vision language model-based caption evaluation method. Our method focuses on visual context, …
abstract arxiv captions context cs.ai cs.cv evaluation extraction generated human image language language model machine metrics modeling progress quality type vision vision language model visual
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US