Feb. 12, 2024, 5:44 a.m. | Seyedarmin Azizi Mahdi Nazemi Massoud Pedram

stat.ML updates on arXiv.org arxiv.org

As Vision Transformers (ViTs) increasingly set new benchmarks in computer vision, their practical deployment on inference engines is often hindered by their significant memory bandwidth and (on-chip) memory footprint requirements. This paper addresses this memory limitation by introducing an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs. The key idea is to decompose the weight tensors into a sum of two parameter-efficient tensors while minimizing the error …

bandwidth benchmarks chip compression computer computer vision cs.ai cs.cv deployment inference low memory methodology mixed paper practical requirements set stat.ml strategy tensor transformers vision vision transformers

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Business Consultant-AI/ML

@ Bosch Group | Bengaluru, India

Senior Network Defense Analyst (AI/ML) - Hybrid

@ Noblis | Linthicum, MD, United States

Senior Data Analyst

@ Peloton | New York City

SC2024-003425 Data Scientist (NS) - WED 6 Mar

@ EMW, Inc. | Brussels, Brussels, Belgium