Adaptive Gradient Methods at the Edge of Stability | allainews.com

April 17, 2024, 4:43 a.m. | Jeremy M. Cohen, Behrooz Ghorbani, Shankar Krishnan, Naman Agarwal, Sourabh Medapati, Michal Badura, Daniel Suo, David Cardoze, Zachary Nado, George E

cs.LG updates on arXiv.org arxiv.org

arXiv:2207.14484v2 Announce Type: replace
Abstract: Very little is known about the training dynamics of adaptive gradient methods like Adam in deep learning. In this paper, we shed light on the behavior of these algorithms in the full-batch and sufficiently large batch settings. Specifically, we empirically demonstrate that during full-batch training, the maximum eigenvalue of the preconditioned Hessian typically equilibrates at a certain numerical value -- the stability threshold of a gradient descent algorithm. For Adam with step size $\eta$ and …

abstract adam algorithms arxiv behavior cs.lg deep learning dynamics edge gradient light paper stability the edge training type

More from arxiv.org / cs.LG updates on arXiv.org

Tao: Re-Thinking DL-based Microarchitecture Simulation 48 minutes ago | arxiv.org

abstract arxiv cs.ar cs.lg +12

Towards a Systems Theory of Algorithms 48 minutes ago | arxiv.org

abstract algorithms arxiv code +16

Object Detection for Automated Coronary Artery Using Deep Learning 48 minutes ago | arxiv.org

abstract arxiv automated cs.cv +21

On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer 48 minutes ago | arxiv.org

abstract agents arxiv cs.lg +16

Computer Vision for Increased Operative Efficiency via Identification of Instruments in the Neurosurgical Operating Room: … 48 minutes ago | arxiv.org

abstract artificial artificial intelligence arxiv +18

A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization 48 minutes ago | arxiv.org

abstract applications arxiv case +16

nach0: Multimodal Natural and Chemical Languages Foundation Model 48 minutes ago | arxiv.org

abstract arxiv biomedical creative +24

How good are Large Language Models on African Languages? 48 minutes ago | arxiv.org

abstract arxiv context cs.ai +19

Using Skew to Assess the Quality of GAN-generated Image Features 48 minutes ago | arxiv.org

abstract advancement adversarial arxiv +20

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Senior Principal, Product Strategy Operations, Cloud Data Analytics

@ Google | Sunnyvale, CA, USA; Austin, TX, USA

View on ai-jobs.net

Data Scientist - HR BU

@ ServiceNow | Hyderabad, India

View on ai-jobs.net