Feb. 7, 2024, 5:44 a.m. | Wei Huang Haotong Qin Yangdong Liu Jingzhuo Liang Yulun Zhang Ying Li Xianglong Liu

cs.LG updates on arXiv.org arxiv.org

Quantization emerges as one of the most promising approaches for deploying advanced deep models on resource-constrained hardware. Mixed-precision quantization leverages multiple bit-width architectures to unleash the accuracy and efficiency potential of quantized models. However, existing mixed-precision quantization suffers exhaustive search space that causes immense computational overhead. The quantization process thus relies on separate high-performance devices rather than locally, which also leads to a significant gap between the considered hardware metrics and the real deployment.In this paper, we propose an On-chip …

accuracy advanced architectures chip computational cs.ai cs.ar cs.lg devices efficiency hardware mixed mixed-precision multiple performance precision process quantization search space

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Consultant Senior Power BI & Azure - CDI - H/F

@ Talan | Lyon, France