April 2, 2024, 7:49 p.m. | Zhiyu Zhao, Bingkun Huang, Sen Xing, Gangshan Wu, Yu Qiao, Limin Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.03149v2 Announce Type: replace
Abstract: Self-supervised foundation models have shown great potential in computer vision thanks to the pre-training paradigm of masked autoencoding. Scale is a primary factor influencing the performance of these foundation models. However, these large foundation models often result in high computational cost. This paper focuses on pre-training relatively small vision transformer models that could be efficiently adapted to downstream tasks. Specifically, taking inspiration from knowledge distillation in model compression, we propose a new asymmetric masked distillation …

arxiv cs.cv distillation foundation pre-training small training type

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Associate Data Analyst

@ Gartner | Stamford - 56 Top Gallant

Ecologist III (Wetland Scientist III)

@ AECOM | Pittsburgh, PA, United States

Senior Data Analyst

@ Publicis Groupe | Bengaluru, India

Data Analyst

@ Delivery Hero | Hong Kong, Hong Kong

Senior Data Engineer

@ ChargePoint | Bengaluru, India