all AI news
Unified Normalization for Accelerating and Stabilizing Transformers. (arXiv:2208.01313v1 [cs.CV])
Aug. 3, 2022, 1:12 a.m. | Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, Shiliang Pu
cs.CV updates on arXiv.org arxiv.org
Solid results from Transformers have made them prevailing architectures in
various natural language and vision tasks. As a default component in
Transformers, Layer Normalization (LN) normalizes activations within each token
to boost the robustness. However, LN requires on-the-fly statistics calculation
in inference as well as division and square root operations, leading to
inefficiency on hardware. What is more, replacing LN with other
hardware-efficient normalization schemes (e.g., Batch Normalization) results in
inferior performance, even collapse in training. We find that this …
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Software Engineer, Data Platforms
@ Whatnot | San Francisco, CA, Los Angeles, CA, New York City, Phoenix, AZ, Seattle, WA, Denver, CO
Staff Data Engineer, Data Platform
@ Lilt | Indianapolis
Business Data Analyst - New Division
@ Breakthru Beverage Group | Toronto, ON, Canada
Data Operations Associate
@ iCapital | New York City, United States
Senior Data Scientist, R&D
@ Plusgrade | Toronto, Ontario