Web: http://arxiv.org/abs/2201.08683

Jan. 24, 2022, 2:10 a.m. | Kishaan Jeeveswaran, Senthilkumar Kathiresan, Arnav Varma, Omar Magdy, Bahram Zonooz, Elahe Arani

cs.CV updates on arXiv.org arxiv.org

Convolutional Neural Networks (CNNs), architectures consisting of
convolutional layers, have been the standard choice in vision tasks. Recent
studies have shown that Vision Transformers (VTs), architectures based on
self-attention modules, achieve comparable performance in challenging tasks
such as object detection and semantic segmentation. However, the image
processing mechanism of VTs is different from that of conventional CNNs. This
poses several questions about their generalizability, robustness, reliability,
and texture bias when used to extract features for complex tasks. To address
these …

arxiv cv prediction study transformers vision

More from arxiv.org / cs.CV updates on arXiv.org

Senior Data Engineer

@ DAZN | Hammersmith, London, United Kingdom

Sr. Data Engineer, Growth

@ Netflix | Remote, United States

Data Engineer - Remote

@ Craft | Wrocław, Lower Silesian Voivodeship, Poland

Manager, Operations Data Science

@ Binance.US | Vancouver

Senior Machine Learning Researcher for Copilot

@ GitHub | Remote - Europe

Sr. Marketing Data Analyst

@ HoneyBook | San Francisco, CA