[D] Critical batch size and LLMs | allainews.com

April 26, 2024, 9:21 a.m. | /u/kiockete

Machine Learning www.reddit.com

In a video about ["A little guide to building Large Language Models in 2024" at 41:38](https://youtu.be/2-SPH9hIKT8?t=2498) the author starts to talk about the limits of how big the batch size can be.

>Well, if you start to have a very large batch size, the model for each optimization step makes less efficient use of each token, because the batch size is so big that each token is kind of washed out in the optimization step. And roughly, it's a …

big call kind machinelearning optimization token

More from www.reddit.com / Machine Learning

[D] Is Evaluating LLM Performance on Domain-Specific QA Sufficient for a Top-Tier Conference Submission? 6 hours ago | www.reddit.com

conference domain five hello +9

[D] Best community/website to find ML engineer interested in hourly work 8 hours ago | www.reddit.com

apis building community custom models +15

[D] What on earth is "discretization" step in Mamba? 11 hours ago | www.reddit.com

article core earth form +11

[R] Better & Faster Large Language Models via Multi-token Prediction 11 hours ago | www.reddit.com

abstract efficiency future gpt +17

[D] How to use RAG benchmarks in practice 15 hours ago | www.reddit.com

context datasets however machinelearning +5

[D] ECCV-2024 reviews are out 1 day ago | www.reddit.com

eccv machinelearning reviews

[D] ICLR Outstanding Paper Awards. Congratulations! 1 day, 2 hours ago | www.reddit.com

abstract feature identify images +12

[D] Where does the term "feature" come from? 1 day, 3 hours ago | www.reddit.com

call engineering feature features +8

[D] Any encoder only model having bigger max token than 512 (BERT, Roberta, etc)? 1 day, 10 hours ago | www.reddit.com

advance bert bigger class +8

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net