April 27, 2024, 3:46 a.m. | Jay

DEV Community dev.to

1.8 TRILLION parameters across 120 layers, making it 10 times larger than GPT-3!


16 EXPERTS within the model, each with 111 BILLION parameters for MLP!


13 TRILLION tokens of training data, including text-based and code-based data, with some fine-tuning from ScaleAI and internally!


$63 MILLION in training costs, taking into account computational power and training time!


3 TIMES MORE expensive to run than the 175B parameter Davinci, due to larger clusters and lower utilization rates!


128 GPUs for inference, using …

ai big billion code computational costs data discuss experts fine-tuning gpt gpt-3 gpt-4 llm machinelearning making mlp parameters power text tokens training training costs training data

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote