Jan. 31, 2024, 4:42 p.m. | Seokju Yun, Youngmin Ro

cs.CV updates on arXiv.org arxiv.org

Recently, efficient Vision Transformers have shown great performance with low
latency on resource-constrained devices. Conventionally, they use 4x4 patch
embeddings and a 4-stage structure at the macro level, while utilizing
sophisticated attention with multi-head configuration at the micro level. This
paper aims to address computational redundancy at all design levels in a
memory-efficient manner. We discover that using larger-stride patchify stem not
only reduces memory access costs but also achieves competitive performance by
leveraging token representations with reduced spatial redundancy …

arxiv attention computational cs.cv design devices embeddings head latency low low latency macro memory multi-head paper performance redundancy stage transformer transformers vision vision transformers

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne