Aug. 16, 2022, 1:14 a.m. | Youyuan Zhang, Jiuniu Wang, Hao Wu, Wenjia Xu

cs.CV updates on arXiv.org arxiv.org

Image captioning models are usually trained according to human annotated
ground-truth captions, which could generate accurate but generic captions. In
this paper, we focus on generating the distinctive captions that can
distinguish the target image from other similar images. To evaluate the
distinctiveness of captions, we introduce a series of metrics that use
large-scale vision-language pre-training model CLIP to quantify the
distinctiveness. To further improve the distinctiveness of captioning models,
we propose a simple and effective training strategy which trains …

arxiv captioning clip cv image optimization

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

IT Data Engineer

@ Procter & Gamble | BUCHAREST OFFICE

Data Engineer (w/m/d)

@ IONOS | Deutschland - Remote

Staff Data Science Engineer, SMAI

@ Micron Technology | Hyderabad - Phoenix Aquila, India

Academically & Intellectually Gifted Teacher (AIG - Elementary)

@ Wake County Public School System | Cary, NC, United States