March 29, 2024, 4:46 a.m. | Youbo Lei, Feifei He, Chen Chen, Yingbin Mo, Si Jia Li, Defeng Xie, Haonan Lu

cs.CV updates on arXiv.org arxiv.org

arXiv:2310.19654v2 Announce Type: replace
Abstract: Due to the success of large-scale visual-language pretraining (VLP) models and the widespread use of image-text retrieval in industry areas, it is now critically necessary to reduce the model size and streamline their mobile-device deployment. Single- and dual-stream model structures are commonly used in image-text retrieval with the goal of closing the semantic gap between textual and visual modalities. While single-stream models use deep feature fusion to achieve more accurate cross-model alignment, dual-stream models are …

abstract alignment arxiv cs.ai cs.cv deployment distillation image industry language mobile modal pretraining reduce retrieval scale success text type visual

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Associate Data Analyst

@ Gartner | Stamford - 56 Top Gallant

Ecologist III (Wetland Scientist III)

@ AECOM | Pittsburgh, PA, United States

Senior Data Analyst

@ Publicis Groupe | Bengaluru, India

Data Analyst

@ Delivery Hero | Hong Kong, Hong Kong

Senior Data Engineer

@ ChargePoint | Bengaluru, India