Web: http://arxiv.org/abs/2206.10996

June 23, 2022, 1:13 a.m. | Delong Chen, Zhao Wu, Fan Liu, Zaiquan Yang, Yixiang Huang, Yiping Bao, Erjin Zhou

cs.CV updates on arXiv.org arxiv.org

Contrastive Language Image Pretraining (CLIP) received widespread attention
since its learned representations can be transferred well to various downstream
tasks. During CLIP training, the InfoNCE objective aims to align positive
image-text pairs and separate negative ones. In this paper, we show a
representation grouping effect during this process: the InfoNCE objective
indirectly groups semantically similar representations together via randomly
emerged within-modal anchors. We introduce Prototypical Contrastive Language
Image Pretraining (ProtoCLIP) to enhance such grouping by boosting its
efficiency and increasing …

arxiv cv image language

More from arxiv.org / cs.CV updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY