Nov. 22, 2022, 2:12 a.m. | Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Lin

cs.CV updates on arXiv.org arxiv.org

Vision Transformers (ViTs) have shown impressive performance but still
require a high computation cost as compared to convolutional neural networks
(CNNs), due to the global similarity measurements and thus a quadratic
complexity with the input tokens. Existing efficient ViTs adopt local attention
(e.g., Swin) or linear attention (e.g., Performer), which sacrifice ViTs'
capabilities of capturing either global or local context. In this work, we ask
an important research question: Can ViTs learn both global and local context
while being more …

angular arxiv attention inference linear self-attention transformer vision

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US