Nov. 5, 2023, 6:43 a.m. | Badri N. Patro, Vijay Srinivas Agneeswaran

cs.LG updates on arXiv.org arxiv.org

Vision transformers have gained significant attention and achieved
state-of-the-art performance in various computer vision tasks, including image
classification, instance segmentation, and object detection. However,
challenges remain in addressing attention complexity and effectively capturing
fine-grained information within images. Existing solutions often resort to
down-sampling operations, such as pooling, to reduce computational cost.
Unfortunately, such operations are non-invertible and can result in information
loss. In this paper, we present a novel approach called Scattering Vision
Transformer (SVT) to tackle these challenges. SVT …

art arxiv attention challenges classification complexity computational computer computer vision cost detection fine-grained image images information instance operations performance pooling reduce sampling segmentation solutions state tasks transformer transformers vision vision transformers

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Scientist, gTech Ads

@ Google | Mexico City, CDMX, Mexico

Lead, Data Analytics Operations

@ Zocdoc | Pune, Maharashtra, India