Sept. 27, 2023, 8:57 a.m. | /u/rejectedlesbian

Machine Learning www.reddit.com

so if u go check the source code for gpt2 u can clearly see that the nrom happens inside the attention and mlp layers.

and that the add is separate. this is in the official openai github and is relatively easy to read:[https://github.com/openai/gpt-2/blob/master/src/model.py#L123-L130](https://github.com/openai/gpt-2/blob/master/src/model.py#L123-L130) (thx KingsmanVince)



for some reason all the online materials are saying that there is a full norm layer before the mlp instead of inside of it

attention check code diagrams inside machinelearning materials mlp norm reason

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120