all AI news
Distinctive Image Captioning via CLIP Guided Group Optimization. (arXiv:2208.04254v5 [cs.CV] UPDATED)
Aug. 30, 2022, 1:14 a.m. | Youyuan Zhang, Jiuniu Wang, Hao Wu, Wenjia Xu
cs.CV updates on arXiv.org arxiv.org
Image captioning models are usually trained according to human annotated
ground-truth captions, which could generate accurate but generic captions. In
this paper, we focus on generating distinctive captions that can distinguish
the target image from other similar images. To evaluate the distinctiveness of
captions, we introduce a series of metrics that use large-scale vision-language
pre-training model CLIP to quantify the distinctiveness. To further improve the
distinctiveness of captioning models, we propose a simple and effective
training strategy that trains the …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Technology Consultant Master Data Management (w/m/d)
@ SAP | Walldorf, DE, 69190
Research Engineer, Computer Vision, Google Research
@ Google | Nairobi, Kenya