Web: http://arxiv.org/abs/2102.04297

May 12, 2022, 1:11 a.m. | Xingyu Wang, Sewoong Oh, Chang-Han Rhee

cs.LG updates on arXiv.org arxiv.org

The empirical success of deep learning is often attributed to SGD's
mysterious ability to avoid sharp local minima in the loss landscape, as sharp
minima are known to lead to poor generalization. Recently, empirical evidence
of heavy-tailed gradient noise was reported in many deep learning tasks, and it
was shown in \c{S}im\c{s}ekli (2019a,b) that SGD can escape sharp local minima
under the presence of such heavy-tailed gradient noise, providing a partial
solution to the mystery. In this work, we analyze …

arxiv noise

More from arxiv.org / cs.LG updates on arXiv.org

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX

Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis

@ Ahmedabad University | Ahmedabad, India

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC