Aug. 30, 2022, 4:01 p.m. | Poulinakis Kon

Towards AI - Medium pub.towardsai.net

Is GELU the ReLU Successor?

Photo by Willian B. on Unsplash

Can we combine regularization and activation functions? In 2016 a paper from authors Dan Hendrycks and Kevin Gimpel came out. Since then, the paper now has been updated 4 times. The authors introduced a new activation function, the Gaussian Error Linear Unit, GELU.

Demystifying GELU

The motivation behind GELU is to bridge stochastic regularizers, such as dropout, with non-linearities, i.e., activation functions.

Dropout regularization stochastically multiplies a neuron’s …

activation-functions data science deep learning machine learning neural networks relu

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne