Sept. 29, 2022, 1:13 a.m. | Tomer Galanti, Zachary S. Siegel, Aparna Gupte, Tomaso Poggio

stat.ML updates on arXiv.org arxiv.org

We analyze deep ReLU neural networks trained with mini-batch Stochastic
Gradient Descent (SGD) and weight decay. We show, both theoretically and
empirically, that when training a neural network using SGD with weight decay
and small batch size, the resulting weight matrices tend to be of small rank.
Our analysis relies on a minimal set of assumptions; the neural networks may be
arbitrarily wide or deep and may include residual connections, as well as
convolutional layers. The same analysis implies the …

arxiv bias low networks neural networks

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analytics & Insight Specialist, Customer Success

@ Fortinet | Ottawa, ON, Canada

Account Director, ChatGPT Enterprise - Majors

@ OpenAI | Remote - Paris