Dec. 29, 2023, 7 a.m. | /u/ytu876

Deep Learning www.reddit.com

Hi,

I'm following [Coding a transformer from scratch](https://www.youtube.com/watch?v=ISNdQcPhsts), and have a question about dropout. What's the criteria for a dropout to be present in a component? My understanding is that dropout is there to prevent overfitting.

1. InputEmbedding has no dropout
2. LayerNormalization has no dropout
3. But things as simple as ResidualConnection (i.e. the Add in the "Add + Norm" part) has dropout

Is there any rule to determine whether a component should have dropout or not?



Thanks.

deeplearning dropout norm part simple transformer

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne