Feb. 12, 2024, 9:09 p.m. | /u/kei147

Machine Learning www.reddit.com

A large fraction of recently released LLMs are using RMSNorm instead of LayerNorm.

The original RMSNorm paper (https://arxiv.org/pdf/1910.07467.pdf) and most references I've seen argue that RMSNorm is better than LayerNorm because it is much more computationally efficient.

However, LayerNorm is a tiny fraction of overall compute, so it's not clear to me why that speedup would help very much. Asymptotically, LayerNorm is O(d_model), while there are components like the MLP that are O(d_model^2 ), or attention that is O(d_model*seq_len + …

clear components compute faster llms machinelearning matter mlp transformers

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Data Engineer (m/f/d)

@ Project A Ventures | Berlin, Germany

Principle Research Scientist

@ Analog Devices | US, MA, Boston