June 8, 2022, 1:12 a.m. | Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman

stat.ML updates on arXiv.org arxiv.org

We study to what extent may stochastic gradient descent (SGD) be understood
as a "conventional" learning rule that achieves generalization performance by
obtaining a good fit to training data. We consider the fundamental stochastic
convex optimization framework, where (one pass, without-replacement) SGD is
classically known to minimize the population risk at rate $O(1/\sqrt n)$, and
prove that, surprisingly, there exist problem instances where the SGD solution
exhibits both empirical risk and generalization gap of $\Omega(1)$.
Consequently, it turns out that …

arxiv gradient lg stochastic underfitting

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Software Engineer, Data Tools - Full Stack

@ DoorDash | Pune, India

Senior Data Analyst

@ Artsy | New York City