Web: https://www.reddit.com/r/MachineLearning/comments/sdjqab/d_on_initialization_schemes_for_mlps_practice_and/

Jan. 26, 2022, 11:31 p.m. | /u/carlml

Machine Learning reddit.com

In this post I am thinking only about MLPs with ReLU activation function.

The default pytorch initialization for linear layers is from a uniform distribution centered at 0 whose limits values depends on the input dimension. Many papers assume initialization from a Gaussian distribution with 0 mean and provide a certain variance.

There is also this work [Pennington (2017)] that proposes orthogonal initialization to achieve what they call dynamical isometry which means that the input-output Jacobian is 1 (or stays …

machinelearning theory

Senior Data Analyst

@ Fanatics Inc | Remote - New York

Data Engineer - Search

@ Cytora | United Kingdom - Remote

Product Manager, Technical - Data Infrastructure and Streaming

@ Nubank | Berlin

Postdoctoral Fellow: ML for autonomous materials discovery

@ Lawrence Berkeley National Lab | Berkeley, CA

Principal Data Scientist

@ Zuora | Remote

Data Engineer

@ Veeva Systems | Pennsylvania - Fort Washington