While neural networks have been remarkably successful in a wide array of
applications, implementing them in resource-constrained hardware remains an
area of intense research. By replacing the weights of a neural network with
quantized (e.g., 4-bit, or binary) counterparts, massive savings in computation
cost, memory, and power consumption are attained. We modify a post-training
neural-network quantization method, GPFQ, that is based on a greedy
path-following mechanism, and rigorously analyze its error. We prove that for
quantizing a single-layer network, the …

