[P] Silly project: implement MLP using a transformer (yo, dawg...) | allainews.com

Dec. 6, 2023, 9:28 p.m. | /u/killerstorm

Machine Learning www.reddit.com

Transformers heavily rely on MLPs. Presumably, facts which LLMs can recall are stored in MLPs. (E.g. see ROME paper.) These MLPs are huge.

E.g. consider GPT-Neo-350M. Each MLP has 1024-element input and output layers and 4096-element inner layer. This requires 2x1024x4096 = 4.2M weights. Whole model has 24 of them (one per layer), resulting in 100M weights being used for MLPs.

And yet these MLPs basically just do 4096 dot products to calculate features. Millions of weights just to calculate …

element facts gpt gpt-neo layer llms machinelearning mlp neo paper project recall them transformer transformers

More from www.reddit.com / Machine Learning

[N] AI engineers report burnout and rushed rollouts as ‘rat race’ to stay competitive hits … 11 hours ago | www.reddit.com

ai tools article artificial artificial intelligence +17

[D] software to design figures 13 hours ago | www.reddit.com

algorithms alphatensor alphazero create +11

[D] How to train a text detection model that will detect it's orientation (rotation) ranging … 13 hours ago | www.reddit.com

case convention detection image +6

[R] HGRN2: Gated Linear RNNs with State Expansion 18 hours ago | www.reddit.com

abstract attention expansion however +15

[R] A Primer on the Inner Workings of Transformer-based Language Models 18 hours ago | www.reddit.com

abstract advanced authors insights +9

[D] Fine-tune Phi-3 model for domain specific data - seeking advice and insights 21 hours ago | www.reddit.com

accuracy advice benchmark data +11

[R] Iterative Reasoning Preference Optimization 1 day, 1 hour ago | www.reddit.com

iterative machinelearning optimization reasoning

[D] Good strategies / resources to improve MLOps skills as a PhD student / researcher 1 day, 6 hours ago | www.reddit.com

eventually good index industry +12

[Discussion] Should I go to ICML and present my paper? 1 day, 6 hours ago | www.reddit.com

academia data data scientist future +10

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States

View on ai-jobs.net