Distinctive Image Captioning via CLIP Guided Group Optimization. (arXiv:2208.04254v3 [cs.CV] UPDATED) | allainews.com

Aug. 12, 2022, 1:12 a.m. | Youyuan Zhang, Jiuniu Wang, Hao Wu, Wenjia Xu

cs.CV updates on arXiv.org arxiv.org

Image captioning models are usually trained according to human annotated
ground-truth captions, which could generate accurate but generic captions. In
this paper, we focus on generating the distinctive captions that can
distinguish the target image from other similar images. To evaluate the
distinctiveness of captions, we introduce a series of metrics that use
large-scale vision-language pre-training model CLIP to quantify the
distinctiveness. To further improve the distinctiveness of captioning models,
we propose a simple and effective training strategy which trains …

arxiv captioning clip cv image optimization

More from arxiv.org / cs.CV updates on arXiv.org

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception 1 day, 3 hours ago | arxiv.org

agent arxiv autonomous cs.cl +8

Low-resolution Prior Equilibrium Network for CT Reconstruction 1 day, 3 hours ago | arxiv.org

abstract arxiv cs.cv deep learning +17

MARformer: An Efficient Metal Artifact Reduction Transformer for Dental CBCT Images 1 day, 3 hours ago | arxiv.org

abstract artifact arxiv cs.cv +16

Back to Basics: Fast Denoising Iterative Algorithm 1 day, 3 hours ago | arxiv.org

abstract algorithm arxiv basics +10

Predicting Thrombectomy Recanalization from CT Imaging Using Deep Learning Models 1 day, 3 hours ago | arxiv.org

abstract arxiv benefit clinicians +10

Efficiently Adversarial Examples Generation for Visual-Language Models under Targeted Transfer Scenarios using Diffusion Models 1 day, 3 hours ago | arxiv.org

abstract adversarial adversarial examples art +20

Methods and strategies for improving the novel view synthesis quality of neural radiation field 1 day, 3 hours ago | arxiv.org

abstract application arxiv attention +16

AffordanceLLM: Grounding Affordance from Vision Language Models 1 day, 3 hours ago | arxiv.org

arxiv cs.cv cs.ro language +3

DualFluidNet: an Attention-based Dual-pipeline Network for FLuid Simulation 1 day, 3 hours ago | arxiv.org

arxiv attention cs.cv cs.gr +4

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Analyst

@ SEAKR Engineering | Englewood, CO, United States

View on ai-jobs.net

Data Analyst II

@ Postman | Bengaluru, India

View on ai-jobs.net

Data Architect

@ FORSEVEN | Warwick, GB

View on ai-jobs.net

Director, Data Science

@ Visa | Washington, DC, United States

View on ai-jobs.net

Senior Manager, Data Science - Emerging ML

@ Capital One | McLean, VA

View on ai-jobs.net