all AI news
Positional Label for Self-Supervised Vision Transformer. (arXiv:2206.04981v2 [cs.CV] UPDATED)
Aug. 16, 2022, 1:13 a.m. | Zhemin Zhang, Xun Gong
cs.CV updates on arXiv.org arxiv.org
Positional encoding is important for vision transformer (ViT) to capture the
spatial structure of the input image. General effectiveness has been proven in
ViT. In our work we propose to train ViT to recognize the positional label of
patches of the input image, this apparently simple task actually yields a
meaningful self-supervisory task. Based on previous work on ViT positional
encoding, we propose two positional labels dedicated to 2D images including
absolute position and relative position. Our positional labels can …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Enterprise AI Architect
@ Oracle | Broomfield, CO, United States
Cloud Data Engineer France H/F (CDI - Confirmé)
@ Talan | Nantes, France