[D] Critical batch size and LLMs | allainews.com

April 26, 2024, 9:21 a.m. | /u/kiockete

Machine Learning www.reddit.com

In a video about ["A little guide to building Large Language Models in 2024" at 41:38](https://youtu.be/2-SPH9hIKT8?t=2498) the author starts to talk about the limits of how big the batch size can be.

>Well, if you start to have a very large batch size, the model for each optimization step makes less efficient use of each token, because the batch size is so big that each token is kind of washed out in the optimization step. And roughly, it's a …

big call kind machinelearning optimization token

More from www.reddit.com / Machine Learning

[D] Llama 3 Monstrosities 2 hours ago | www.reddit.com

create easy life llama +4

[P] LeRobot: Hugging Face's library for real-world robotics 9 hours ago | www.reddit.com

academia advanced advanced ai ai development +13

[D] Kolmogorov-Arnold Network is just an MLP 9 hours ago | www.reddit.com

machinelearning mlp network relu +1

[D] Why Gemma has such crazy big MLP hidden dim size? 10 hours ago | www.reddit.com

big gemma hidden machinelearning +1

[R] Why can Llama-3 work with 32K context if it only had 8K context length? 11 hours ago | www.reddit.com

32k context config context dynamic +7

[D] Is there a formal name for "dialogue classification?" 17 hours ago | www.reddit.com

agents classification customer customer service +11

How Large Language Models play video games [D] 17 hours ago | www.reddit.com

agents case engineering explore +15

[Project] An LLM-Powered Web App for SEC Filing Insights 18 hours ago | www.reddit.com

apis app financial future +18

[Research] Understanding The Attention Mechanism In Transformers: A 5-minute visual guide. 🧠 22 hours ago | www.reddit.com

architectures attention dictionary guide +12

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data Engineer

@ Kaseya | Bengaluru, Karnataka, India

View on ai-jobs.net