April 23, 2024, 4:43 a.m. | Yang He, Joey Tianyi Zhou

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.13648v1 Announce Type: cross
Abstract: Hierarchical vision transformers (ViTs) have two advantages over conventional ViTs. First, hierarchical ViTs achieve linear computational complexity with respect to image size by local self-attention. Second, hierarchical ViTs create hierarchical feature maps by merging image patches in deeper layers for dense prediction. However, existing pruning methods ignore the unique properties of hierarchical ViTs and use the magnitude value as the weight importance. This approach leads to two main drawbacks. First, the "local" attention weights are …

abstract advantages arxiv attention complexity computational create cs.cv cs.lg data feature hierarchical however image independent linear maps merging prediction pruning self-attention transformers type vision vision transformers

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Science Analyst

@ Mayo Clinic | AZ, United States

Sr. Data Scientist (Network Engineering)

@ SpaceX | Redmond, WA