[D] Can GPT-4 really be both 16x111B and 1.8T parameters? | allainews.com

Feb. 19, 2024, 6:35 a.m. | /u/kei147

Machine Learning www.reddit.com

A report by Semianalysis back in July said that GPT-4 was a 1.8T parameter MoE model that had 16 experts, each with 111B parameters. This is according to a summary I read, because I can't get past the paywall.

It seems like these two numbers line up because 16 \* 111B = 1.776T which is approximately equal to 1.8T.

But I've read that this is not the right way to calculate the total number of parameters in a mixture of …

experts gpt gpt-4 line machinelearning moe numbers parameters report summary

More from www.reddit.com / Machine Learning

[P] [D] Is inference time the important performance metric for ML Models on edge/mobile? 7 hours ago | www.reddit.com

apps devices edge embed +15

[D] Any-dimensional equivariant neural networks 8 hours ago | www.reddit.com

abstract assumptions authors cases +18

How are large network attack datasets made? [p] 13 hours ago | www.reddit.com

attacks datasets detection free +5

A Multi-Agent game where LLMs must trick each other as humans until one gets caught … 16 hours ago | www.reddit.com

agent fun game humans +7

[D] How reliable is RAG currently? 16 hours ago | www.reddit.com

context context window documents machinelearning +5

[N] New Challenges in DIAMBRA Arena: 3 epic additions to our lineup of RL environments! 16 hours ago | www.reddit.com

arena challenges environments epic +1

[R] An Analysis of Linear Time Series Forecasting Models 18 hours ago | www.reddit.com

abstract analysis forecasting form +9

[D] The "it" in AI models is really just the dataset? 19 hours ago | www.reddit.com

ai models dataset machinelearning

[D] Analysis of Time To First Token (TTFT) of LLMs (10B-34B) 21 hours ago | www.reddit.com

analysis containers docker hey +10

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net