Acceleration of stochastic gradient descent with momentum by averaging: finite-sample rates and asymptotic normality | allainews.com

Feb. 2, 2024, 3:46 p.m. | Kejie Tang Weidong Liu Yichen Zhang Xi Chen

cs.LG updates on arXiv.org arxiv.org

Stochastic gradient descent with momentum (SGDM) has been widely used in many machine learning and statistical applications. Despite the observed empirical benefits of SGDM over traditional SGD, the theoretical understanding of the role of momentum for different learning rates in the optimization process remains widely open. We analyze the finite-sample convergence rate of SGDM under the strongly convex settings and show that, with a large batch size, the mini-batch SGDM converges faster than the mini-batch SGD to a neighborhood of …

applications benefits cs.lg gradient machine machine learning normality optimization process role sample statistical stat.ml stochastic understanding

More from arxiv.org / cs.LG updates on arXiv.org

Marabou 2.0: A Versatile Formal Analyzer of Neural Networks 10 hours ago | arxiv.org

abstract analysis arxiv components +16

Metric Entropy-Free Sample Complexity Bounds for Sample Average Approximation in Convex Stochastic Programming 10 hours ago | arxiv.org

abstract approximation arxiv complexity +15

FengWu-4DVar: Coupling the Data-driven Weather Forecasting Model with 4D Variational Assimilation 10 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +16

Image Restoration Through Generalized Ornstein-Uhlenbeck Bridge 10 hours ago | arxiv.org

arxiv bridge cs.ai cs.cv +8

Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models 10 hours ago | arxiv.org

arxiv cs.cl cs.lg incremental +7

System-level Safety Guard: Safe Tracking Control through Uncertain Neural Network Dynamics Models 10 hours ago | arxiv.org

arxiv control cs.lg cs.ro +13

Structured state-space models are deep Wiener models 10 hours ago | arxiv.org

abstract arxiv become classification +16

Differentiable and accelerated spherical harmonic and Wigner transforms 10 hours ago | arxiv.org

abstract analysis and analysis arxiv +16

Stable Attractors for Neural networks classification via Ordinary Differential Equations (SA-nODE) 10 hours ago | arxiv.org

abstract arxiv classification cond-mat.dis-nn +18

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net