Web: http://arxiv.org/abs/2206.11062

June 23, 2022, 1:10 a.m. | Ibrahim Ahmed, Sahil Parmar, Matthew Boyd, Michael Beidler, Kris Kang, Bill Liu, Kyle Roach, John Kim, Dennis Abts

cs.LG updates on arXiv.org arxiv.org

Transformers have become a predominant machine learning workload, they are
not only the de-facto standard for natural language processing tasks, but they
are also being deployed in other domains such as vision and speech recognition.
Many of the transformer-based applications are real-time systems such as
machine translation and web search. These real time systems often come with
strict end-to-end inference latency requirements. Unfortunately, while the
majority of the transformer computation comes from matrix multiplications,
transformers also include several non-linear components …

