June 20, 2022, 1:13 a.m. | Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei Zhang, Xin Jiang, Chunjing Xu, Hang Xu

cs.CV updates on arXiv.org arxiv.org

Vision-Language Pre-training (VLP) models have shown remarkable performance
on various downstream tasks. Their success heavily relies on the scale of
pre-trained cross-modal datasets. However, the lack of large-scale datasets and
benchmarks in Chinese hinders the development of Chinese VLP models and broader
multilingual applications. In this work, we release a large-scale Chinese
cross-modal dataset named Wukong, which contains 100 million Chinese image-text
pairs collected from the web. Wukong aims to benchmark different multi-modal
pre-training methods to facilitate the VLP research …

arxiv benchmark cv pre-training scale training

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst

@ SEAKR Engineering | Englewood, CO, United States

Data Analyst II

@ Postman | Bengaluru, India

Data Architect

@ FORSEVEN | Warwick, GB

Director, Data Science

@ Visa | Washington, DC, United States

Senior Manager, Data Science - Emerging ML

@ Capital One | McLean, VA