all AI news
CoCa: Contrastive Captioners are Image-Text Foundation Models. (arXiv:2205.01917v1 [cs.CV])
Web: http://arxiv.org/abs/2205.01917
May 5, 2022, 1:10 a.m. | Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu
cs.CV updates on arXiv.org arxiv.org
Exploring large-scale pretrained foundation models is of significant interest
in computer vision because these models can be quickly transferred to many
downstream tasks. This paper presents Contrastive Captioner (CoCa), a
minimalist design to pretrain an image-text encoder-decoder foundation model
jointly with contrastive loss and captioning loss, thereby subsuming model
capabilities from contrastive approaches like CLIP and generative methods like
SimVLM. In contrast to standard encoder-decoder transformers where all decoder
layers attend to encoder outputs, CoCa omits cross-attention in the first …
More from arxiv.org / cs.CV updates on arXiv.org
Latest AI/ML/Big Data Jobs
Director, Applied Mathematics & Computational Research Division
@ Lawrence Berkeley National Lab | Berkeley, Ca
Business Data Analyst
@ MainStreet Family Care | Birmingham, AL
Assistant/Associate Professor of the Practice in Business Analytics
@ Georgetown University McDonough School of Business | Washington DC
Senior Data Science Writer
@ NannyML | Remote
Director of AI/ML Engineering
@ Armis Industries | Remote (US only), St. Louis, California
Digital Analytics Manager
@ Patagonia | Ventura, California