Dec. 6, 2023, 9:28 p.m. | /u/killerstorm

Machine Learning www.reddit.com

Transformers heavily rely on MLPs. Presumably, facts which LLMs can recall are stored in MLPs. (E.g. see ROME paper.) These MLPs are huge.

E.g. consider GPT-Neo-350M. Each MLP has 1024-element input and output layers and 4096-element inner layer. This requires 2x1024x4096 = 4.2M weights. Whole model has 24 of them (one per layer), resulting in 100M weights being used for MLPs.

And yet these MLPs basically just do 4096 dot products to calculate features. Millions of weights just to calculate …

element facts gpt gpt-neo layer llms machinelearning mlp neo paper project recall them transformer transformers

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States