Web: https://www.reddit.com/r/MachineLearning/comments/un0crv/r_fullbatch_gd_generalizes_better_than_sgd/

May 11, 2022, 3:25 a.m. | /u/chaotic_shadow4444

Machine Learning reddit.com

This paper [https://arxiv.org/abs/2204.12446](https://arxiv.org/abs/2204.12446) shows the dependence of generalization error with respect to the optimization error. For smooth losses ( examples : log sum exp, or smoothed leaky relu activation functions), full-batch GD generalizes better than SGD (or at least compared to known generalization error bounds).

Additionally, in the over-parametrized regime (exact fit) the Polyak-Lojasiewicz condition holds ([https://arxiv.org/abs/2003.00307](https://arxiv.org/abs/2003.00307),) and T= C log(n) iterations are required to achieve generalization and excess risk of the order 1/n\^2. In practice, this should be translated …


Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX

Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis

@ Ahmedabad University | Ahmedabad, India

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC