Web: http://arxiv.org/abs/2209.06430

Sept. 26, 2022, 1:14 a.m. | Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo

cs.CV updates on arXiv.org arxiv.org

The pre-trained image-text models, like CLIP, have demonstrated the strong
power of vision-language representation learned from a large scale of
web-collected image-text data. In light of the well-learned visual features,
some existing works transfer image representation to video domain and achieve
good results. However, how to utilize image-language pre-trained model (e.g.,
CLIP) for video-language pre-training (post-pretraining) is still under
explored. In this paper, we investigate two questions: 1) what are the factors
hindering post-pretraining CLIP to further improve the performance …

alignment arxiv clip image language representation text video

More from arxiv.org / cs.CV updates on arXiv.org

Staff Data Scientist - Merchant Services (Remote, North America)

@ Shopify | Dallas, TX, United States

Machine Learning / Data Engineer

@ WATI | Vietnam - Remote

F/H Data Manager

@ Bosch Group | Saint-Ouen-sur-Seine, France

[Fixed-term contract until July 2023] Data Quality Controller - Space Industry Luxembourg (m/f/o)

@ LuxSpace Sarl | Betzdorf, Luxembourg

Senior Data Engineer (Azure DataBricks/datalake)

@ SpectraMedix | East Windsor, NJ, United States

Abschlussarbeit im Bereich Data Analytics (w/m/div.)

@ Bosch Group | Rülzheim, Germany

Data Engineer - Marketing

@ Publicis Groupe | London, United Kingdom

Data Engineer (Consulting division)

@ Starschema | Budapest, Hungary

Team Leader, Master Data Management - Support CN, HK & TW

@ Publicis Groupe | Kuala Lumpur, Malaysia

Senior Software Engineer (Big Data & Platform Team) - Data & AI

@ Allegro | Warszawa, Toruń, Kraków, Poznań, Poland

Développeur Big Data (H/F)

@ CITECH | Paris, France

Big Data Engineer - Data & AI

@ Allegro | Poznań, Warszawa, Kraków, Toruń, Wrocław, Gdańsk, Łódź, Poland