Feb. 7, 2024, 5:43 a.m. | Yusu Hong Junhong Lin

cs.LG updates on arXiv.org arxiv.org

The Adaptive Momentum Estimation (Adam) algorithm is highly effective in training various deep learning tasks. Despite this, there's limited theoretical understanding for Adam, especially when focusing on its vanilla form in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. In this paper, we study vanilla Adam under these challenging conditions. We introduce a comprehensive noise model which governs affine variance noise, bounded noise and sub-Gaussian noise. We show that Adam can find a stationary point with a …

adam algorithm assumptions convergence cs.lg deep learning form math.oc noise optimization paper stat.ml stochastic study tasks training understanding variance

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Developer AI Senior Staff Engineer, Machine Learning

@ Google | Sunnyvale, CA, USA; New York City, USA

Engineer* Cloud & Data Operations (f/m/d)

@ SICK Sensor Intelligence | Waldkirch (bei Freiburg), DE, 79183