April 26, 2024, 9:21 a.m. | /u/kiockete

Machine Learning www.reddit.com

In a video about ["A little guide to building Large Language Models in 2024" at 41:38](https://youtu.be/2-SPH9hIKT8?t=2498) the author starts to talk about the limits of how big the batch size can be.



>Well, if you start to have a very large batch size, the model for each optimization step makes less efficient use of each token, because the batch size is so big that each token is kind of washed out in the optimization step. And roughly, it's a …

big call kind machinelearning optimization token

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Data Engineer

@ Kaseya | Bengaluru, Karnataka, India