Feb. 1, 2024, 12:42 p.m. | Dingyi Dai Yichi Zhang Jiahao Zhang Zhanqiu Hu Yaohui Cai Qi Sun Zhiru Zhang

cs.CV updates on arXiv.org arxiv.org

Quantization is a crucial technique for deploying deep learning models on resource-constrained devices, such as embedded FPGAs. Prior efforts mostly focus on quantizing matrix multiplications, leaving other layers like BatchNorm or shortcuts in floating-point form, even though fixed-point arithmetic is more efficient on FPGAs. A common practice is to fine-tune a pre-trained model to fixed-point for FPGA deployment, but potentially degrading accuracy.
This work presents QFX, a novel trainable fixed-point quantization approach that automatically learns the binary-point position during model …

cs.cv cs.lg deep learning devices embedded fixed-point focus form fpgas matrix practice prior quantization

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Robotics Technician - 3rd Shift

@ GXO Logistics | Perris, CA, US, 92571