Sept. 20, 2022, 1:13 a.m. | Juncheng Li, Xin He, Longhui Wei, Long Qian, Linchao Zhu, Lingxi Xie, Yueting Zhuang, Qi Tian, Siliang Tang

cs.CV updates on arXiv.org arxiv.org

Large-scale vision-language pre-training has shown impressive advances in a
wide range of downstream tasks. Existing methods mainly model the cross-modal
alignment by the similarity of the global representations of images and texts,
or advanced cross-modal attention upon image and text features. However, they
fail to explicitly learn the fine-grained semantic alignment between visual
regions and textual phrases, as only global image-text alignment information is
available. In this paper, we introduce LOUPE, a fine-grained semantically
aLigned visiOn-langUage PrE-training framework, which learns …

arxiv fine-grained language pre-training training vision

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Associate Data Engineer

@ Redkite | London, England, United Kingdom

Data Management Associate Consultant

@ SAP | Porto Salvo, PT, 2740-262

NLP & Data Modelling Consultant - SAP LABS

@ SAP | Bengaluru, IN, 560066

Catalog Data Quality Specialist

@ Delivery Hero | Montevideo, Uruguay

Data Analyst for CEO Office with Pathway to Functional Analyst

@ Amar Bank | Jakarta