all AI news
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms. (arXiv:2205.10287v1 [cs.LG])
May 23, 2022, 1:11 a.m. | Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora
cs.LG updates on arXiv.org arxiv.org
Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential
Equation (SDE) has allowed researchers to enjoy the benefits of studying a
continuous optimization trajectory while carefully preserving the stochasticity
of SGD. Analogous study of adaptive gradient methods, such as RMSprop and Adam,
has been challenging because there were no rigorously proven SDE approximations
for these methods. This paper derives the SDE approximations for RMSprop and
Adam, giving theoretical guarantees of their correctness as well as
experimental validation of their applicability …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Analytics Engineer
@ CircleCI | Remote (US), Remote (Canada), San Francisco, Denver
Bilingual Executive Assistant/Data Analyst - (French and English) - Export
@ Dangote Group | Lagos, Lagos, Nigeria
Workday Services Data Lead
@ WPP | Mexico City, Mexico
Business Data Analyst
@ Nordea | Tallinn, EE, 11415
Data Integrity Lead
@ BioNTech SE | Gaithersburg, MD, US, MD 20878