all AI news
Faster Dynamically Quantized Inference with XNNPack
The TensorFlow Blog blog.tensorflow.org
Posted by Alan Kelly, Software Engineer
We are excited to announce that XNNPack’s Fully Connected and Convolution 2D operators now support dynamic range quantization. XNNPack is TensorFlow Lite’s CPU backend and CPUs deliver the widest reach for ML inference and remain the default target for TensorFlow Lite. Consequently, improving CPU inference performance is a top priority. We quadrupled inference performance in TensorFlow Lite’s XNNPack backend compared to the single precision baseline by adding support for dynamic range quantization to …
ai alan announcement backend convolution cpu cpus dynamic engineer faster improving inference learn ml inference operators performance quantization software software engineer support tensorflow tensorflow-lite