Oct. 20, 2022, 1:16 a.m. | Tan Yu, Gangming Zhao, Ping Li, Yizhou Yu

cs.CV updates on arXiv.org arxiv.org

Vision Transformers achieved outstanding performance in many computer vision
tasks. Early Vision Transformers such as ViT and DeiT adopt global
self-attention, which is computationally expensive when the number of patches
is large. To improve efficiency, recent Vision Transformers adopt local
self-attention mechanisms, where self-attention is computed within local
windows. Despite the fact that window-based local self-attention significantly
boosts efficiency, it fails to capture the relationships between distant but
similar patches in the image plane. To overcome this limitation of image-space …

arxiv attention local attention transformer vision

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne