all AI news
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization. (arXiv:2208.14286v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
Quantization is a technique to reduce the computation and memory cost of DNN
models, which are getting increasingly large. Existing quantization solutions
use fixed-point integer or floating-point types, which have limited benefits,
as both require more bits to maintain the accuracy of original models. On the
other hand, variable-length quantization uses low-bit quantization for normal
values and high-precision for a fraction of outlier values. Even though this
line of work brings algorithmic benefits, it also introduces significant
hardware overheads due …
ant arxiv data deep neural network network neural network numerical quantization type