all AI news
SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks. (arXiv:2206.05794v2 [cs.LG] UPDATED)
Sept. 29, 2022, 1:13 a.m. | Tomer Galanti, Zachary S. Siegel, Aparna Gupte, Tomaso Poggio
stat.ML updates on arXiv.org arxiv.org
We analyze deep ReLU neural networks trained with mini-batch Stochastic
Gradient Descent (SGD) and weight decay. We show, both theoretically and
empirically, that when training a neural network using SGD with weight decay
and small batch size, the resulting weight matrices tend to be of small rank.
Our analysis relies on a minimal set of assumptions; the neural networks may be
arbitrarily wide or deep and may include residual connections, as well as
convolutional layers. The same analysis implies the …
More from arxiv.org / stat.ML updates on arXiv.org
Estimation Sample Complexity of a Class of Nonlinear Continuous-time Systems
2 days, 12 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Analytics & Insight Specialist, Customer Success
@ Fortinet | Ottawa, ON, Canada
Account Director, ChatGPT Enterprise - Majors
@ OpenAI | Remote - Paris