Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy | allainews.com

Feb. 12, 2024, 5:44 a.m. | Seyedarmin Azizi Mahdi Nazemi Massoud Pedram

stat.ML updates on arXiv.org arxiv.org

As Vision Transformers (ViTs) increasingly set new benchmarks in computer vision, their practical deployment on inference engines is often hindered by their significant memory bandwidth and (on-chip) memory footprint requirements. This paper addresses this memory limitation by introducing an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs. The key idea is to decompose the weight tensors into a sum of two parameter-efficient tensors while minimizing the error …

bandwidth benchmarks chip compression computer computer vision cs.ai cs.cv deployment inference low memory methodology mixed paper practical requirements set stat.ml strategy tensor transformers vision vision transformers

More from arxiv.org / stat.ML updates on arXiv.org

Non-asymptotic estimates for accelerated high order Langevin Monte Carlo algorithms 2 days, 7 hours ago | arxiv.org

abstract algorithms arxiv convergence +9

Entropic covariance models 3 days, 7 hours ago | arxiv.org

abstract arxiv challenges covariance +12

Bump hunting through density curvature features 3 days, 7 hours ago | arxiv.org

abstract arxiv construct data +18

Uncertainty quantification in metric spaces 3 days, 7 hours ago | arxiv.org

abstract algorithms arxiv datasets +15

Guiding adaptive shrinkage by co-data to improve regression-based prediction and feature selection 3 days, 7 hours ago | arxiv.org

abstract arxiv clinical data +17

A general error analysis for randomized low-rank approximation with application to data assimilation 3 days, 7 hours ago | arxiv.org

abstract algebra algorithms analysis +17

Calabi-Yau Four/Five/Six-folds as $\mathbb{P}^n_\textbf{w}$ Hypersurfaces: Machine Learning, Approximation, and Generation 4 days, 7 hours ago | arxiv.org

abstract approximation arxiv five +17

Bayesian Quantile Regression with Subset Selection: A Posterior Summarization Perspective 4 days, 7 hours ago | arxiv.org

abstract arxiv bayesian distribution +16

The Projected Covariance Measure for assumption-lean variable significance testing 4 days, 7 hours ago | arxiv.org

abstract arxiv covariance lean +14

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net