Feb. 13, 2024, 5:44 a.m. | Arnav Kundu Chungkuk Yoo Srijan Mishra Minsik Cho Saurabh Adya

cs.LG updates on arXiv.org arxiv.org

Model quantization and compression is widely used techniques to reduce usage of computing resource at inference time. While state-of-the-art works have been achieved reasonable accuracy with higher bit such as 4bit or 8bit, but still it is challenging to quantize/compress a model further, e.g., 1bit or 2bit. To overcome the challenge, we focus on outliers in weights of a pre-trained model which disrupt effective lower bit quantization and compression. In this work, we propose Range Restriction Loss (R2-Loss) for building …

8bit accuracy art compression computing cs.cv cs.lg cs.pf eess.iv inference loss quantization reduce state usage

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Data Tools - Full Stack

@ DoorDash | Pune, India

Senior Data Analyst

@ Artsy | New York City