Feb. 19, 2024, 6:35 a.m. | /u/kei147

Machine Learning www.reddit.com

A report by Semianalysis back in July said that GPT-4 was a 1.8T parameter MoE model that had 16 experts, each with 111B parameters. This is according to a summary I read, because I can't get past the paywall.

It seems like these two numbers line up because 16 \* 111B = 1.776T which is approximately equal to 1.8T.

But I've read that this is not the right way to calculate the total number of parameters in a mixture of …

experts gpt gpt-4 line machinelearning moe numbers parameters report summary

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne