SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks. (arXiv:2206.05794v2 [cs.LG] UPDATED) | allainews.com

Sept. 29, 2022, 1:13 a.m. | Tomer Galanti, Zachary S. Siegel, Aparna Gupte, Tomaso Poggio

stat.ML updates on arXiv.org arxiv.org

We analyze deep ReLU neural networks trained with mini-batch Stochastic
Gradient Descent (SGD) and weight decay. We show, both theoretically and
empirically, that when training a neural network using SGD with weight decay
and small batch size, the resulting weight matrices tend to be of small rank.
Our analysis relies on a minimal set of assumptions; the neural networks may be
arbitrarily wide or deep and may include residual connections, as well as
convolutional layers. The same analysis implies the …

arxiv bias low networks neural networks

More from arxiv.org / stat.ML updates on arXiv.org

Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions 12 hours ago | arxiv.org

abstract arxiv bias canonical +16

Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model 12 hours ago | arxiv.org

abstract arxiv behavior convergence +15

Partial recovery and weak consistency in the non-uniform hypergraph Stochastic Block Model 12 hours ago | arxiv.org

abstract algorithm arxiv block +15

Estimating the Number of Components in Finite Mixture Models via Variational Approximation 12 hours ago | arxiv.org

abstract approximation arxiv bayes +11

Conformalized Ordinal Classification with Marginal and Conditional Coverage 12 hours ago | arxiv.org

abstract algorithm applications arxiv +16

Multi-Study R-Learner for Estimating Heterogeneous Treatment Effects Across Studies Using Statistical Machine Learning 21 hours ago | arxiv.org

abstract arxiv effects machine +15

Spatial best linear unbiased prediction: A computational mathematics approach for high dimensional massive datasets 21 hours ago | arxiv.org

abstract arxiv challenges classification +20

Estimation Sample Complexity of a Class of Nonlinear Continuous-time Systems 2 days, 12 hours ago | arxiv.org

abstract arxiv class complexity +14

Estimation and Uniform Inference in Sparse High-Dimensional Additive Models 2 days, 12 hours ago | arxiv.org

abstract arxiv confidence construct +9

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Analytics & Insight Specialist, Customer Success

@ Fortinet | Ottawa, ON, Canada

View on ai-jobs.net

Account Director, ChatGPT Enterprise - Majors

@ OpenAI | Remote - Paris

View on ai-jobs.net