[D] GPT2 diagrams are wrong | allainews.com

Sept. 27, 2023, 8:57 a.m. | /u/rejectedlesbian

Machine Learning www.reddit.com

so if u go check the source code for gpt2 u can clearly see that the nrom happens inside the attention and mlp layers.

and that the add is separate. this is in the official openai github and is relatively easy to read:[https://github.com/openai/gpt-2/blob/master/src/model.py#L123-L130](https://github.com/openai/gpt-2/blob/master/src/model.py#L123-L130) (thx KingsmanVince)

for some reason all the online materials are saying that there is a full norm layer before the mlp instead of inside of it

attention check code diagrams inside machinelearning materials mlp norm reason

More from www.reddit.com / Machine Learning

[N] AI engineers report burnout and rushed rollouts as ‘rat race’ to stay competitive hits … 5 hours ago | www.reddit.com

ai tools article artificial artificial intelligence +17

[R] HGRN2: Gated Linear RNNs with State Expansion 12 hours ago | www.reddit.com

abstract attention expansion however +15

[R] A Primer on the Inner Workings of Transformer-based Language Models 12 hours ago | www.reddit.com

abstract advanced authors insights +9

[D] Fine-tune Phi-3 model for domain specific data - seeking advice and insights 14 hours ago | www.reddit.com

accuracy advice benchmark data +11

[R] Iterative Reasoning Preference Optimization 18 hours ago | www.reddit.com

iterative machinelearning optimization reasoning

[D] Good strategies / resources to improve MLOps skills as a PhD student / researcher 1 day ago | www.reddit.com

eventually good index industry +12

[Discussion] Should I go to ICML and present my paper? 1 day ago | www.reddit.com

academia data data scientist future +10

[P] Panza: A personal email assistant, trained and running on-device 1 day ago | www.reddit.com

assistant automated email emails +9

[Discussion] Seeking help to find the better GPU setup. Three H100 vs Five A100? 1 day, 2 hours ago | www.reddit.com

70b a100 budget five +9

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

View on ai-jobs.net

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120

View on ai-jobs.net