all AI news
The Marginal Value of Momentum for Small Learning Rate SGD
April 17, 2024, 4:43 a.m. | Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li
cs.LG updates on arXiv.org arxiv.org
Abstract: Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise. In stochastic optimization, such as training neural networks, folklore suggests that momentum may help deep learning optimization by reducing the variance of the stochastic gradient update, but previous theoretical analyses do not find momentum to offer any provable acceleration. Theoretical results in this paper clarify the role of momentum in stochastic settings where the learning rate is …
abstract arxiv convergence cs.lg deep learning gradient math.oc networks neural networks noise optimization rate small stochastic training type update value variance
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
MLOps Engineer - Hybrid Intelligence
@ Capgemini | Madrid, M, ES
Analista de Business Intelligence (Industry Insights)
@ NielsenIQ | Cotia, Brazil