Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training | allainews.com

Jan. 1, 2022, midnight | Diego Granziol, Stefan Zohren, Stephen Roberts

JMLR www.jmlr.org

We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We demonstrate that the magnitude of the extremal values of the batch Hessian are larger than those of the empirical Hessian. We also derive similar results for the Generalised Gauss-Newton matrix approximation of the Hessian. As a consequence of our theorems we derive an analytical expressions for the maximal learning rates as a function of batch size, informing practical training …

function learning network network training neural network random theory training

More from www.jmlr.org / JMLR

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions 3 months, 2 weeks ago | www.jmlr.org

approximation beyond diverse function +10

Model-Free Representation Learning and Exploration in Low-Rank MDPs 3 months, 2 weeks ago | www.jmlr.org

algorithms contrast dynamics exploration +9

Effect-Invariant Mechanisms for Policy Generalization 3 months, 2 weeks ago | www.jmlr.org

adapt challenge environments exploit +7

Pygmtools: A Python Graph Matching Toolkit 3 months, 2 weeks ago | www.jmlr.org

applications collection free graph +8

Power of knockoff: The impact of ranking algorithm, augmented design, and symmetric statistic 3 months, 2 weeks ago | www.jmlr.org

algorithm components control design +11

Heterogeneous-Agent Reinforcement Learning 3 months, 2 weeks ago | www.jmlr.org

agent agents ai research convergence +10

Sample-efficient Adversarial Imitation Learning 3 months, 2 weeks ago | www.jmlr.org

advanced adversarial behavior decision +13

Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent 3 months, 2 weeks ago | www.jmlr.org

diffusion dynamics gradient mean +4

Rates of convergence for density estimation with generative adversarial networks 3 months, 2 weeks ago | www.jmlr.org

adversarial convergence divergence error +11

Senior Data Engineer

@ Publicis Groupe | New York City, United States

View on ai-jobs.net

Associate Principal Robotics Engineer - Research.

@ Dyson | United Kingdom - Hullavington Office

View on ai-jobs.net

Duales Studium mit vertiefter Praxis: Bachelor of Science Künstliche Intelligenz und Data Science (m/w/d)

@ Gerresheimer | Wackersdorf, Germany

View on ai-jobs.net

AI/ML Engineer (TS/SCI) {S}

@ ARKA Group, LP | Aurora, Colorado, United States

View on ai-jobs.net

Data Integration Engineer

@ Find.co | Sliema

View on ai-jobs.net

Data Engineer

@ Q2 | Bengaluru, India

View on ai-jobs.net