May 20, 2022, 1:10 a.m. | Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao

cs.CV updates on arXiv.org arxiv.org

Vision Transformers (ViT) become widely-adopted architectures for various
vision tasks. Masked auto-encoding for feature pretraining and multi-scale
hybrid convolution-transformer architectures can further unleash the potentials
of ViT, leading to state-of-the-art performances on image classification,
detection and semantic segmentation. In this paper, our ConvMAE framework
demonstrates that multi-scale hybrid convolution-transformer can learn more
discriminative representations via the mask auto-encoding scheme. However,
directly using the original masking strategy leads to the heavy computational
cost and pretraining-finetuning discrepancy. To tackle the issue, we …

arxiv convolution cv

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Senior Product Manager - Real-Time Payments Risk AI & Analytics

@ Visa | London, United Kingdom

Business Analyst (AI Industry)

@ SmartDev | Cầu Giấy, Vietnam

Computer Vision Engineer

@ Sportradar | Mont-Saint-Guibert, Belgium

Data Analyst

@ Unissant | Alexandria, VA, USA

Senior Applied Scientist

@ Zillow | Remote-USA