Web: http://arxiv.org/abs/2107.10998

Jan. 27, 2022, 2:10 a.m. | Dan Liu, Xi Chen, Jie Fu, Chen Ma, Xue Liu

cs.CV updates on arXiv.org arxiv.org

Inference time, model size, and accuracy are three key factors in deep model

Most of the existing work addresses these three key factors separately as it
is difficult to optimize them all at the same time.

For example, low-bit quantization aims at obtaining a faster model; weight
sharing quantization aims at improving compression ratio and accuracy; and
mixed-precision quantization aims at balancing accuracy and inference time. To
simultaneously optimize bit-width, model size, and accuracy, we propose pruning
ternary quantization …

