all AI news
Patch-level Representation Learning for Self-supervised Vision Transformers. (arXiv:2206.07990v1 [cs.CV])
cs.LG updates on arXiv.org arxiv.org
Recent self-supervised learning (SSL) methods have shown impressive results
in learning visual representations from unlabeled images. This paper aims to
improve their performance further by utilizing the architectural advantages of
the underlying neural network, as the current state-of-the-art visual pretext
tasks for SSL do not enjoy the benefit, i.e., they are architecture-agnostic.
In particular, we focus on Vision Transformers (ViTs), which have gained much
attention recently as a better architectural choice, often outperforming
convolutional networks for various visual tasks. The …
arxiv cv learning representation representation learning transformers vision