Feb. 12, 2024, 5:42 a.m. | Gon Buzaglo Itamar Harel Mor Shpigel Nacson Alon Brutzkus Nathan Srebro Daniel Soudry

cs.LG updates on arXiv.org arxiv.org

Background. A main theoretical puzzle is why over-parameterized Neural Networks (NNs) generalize well when trained to zero loss (i.e., so they interpolate the data). Usually, the NN is trained with Stochastic Gradient Descent (SGD) or one of its variants. However, recent empirical work examined the generalization of a random NN that interpolates the data: the NN was sampled from a seemingly uniform prior over the parameters, conditioned on that the NN perfectly classifying the training set. Interestingly, such a NN …

bias cs.lg data gradient loss narrow networks neural networks nns puzzle random stat.ml stochastic teachers uniform variants work

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Encounter Data Management Professional

@ Humana | Work at Home - Kentucky

Pre-sales Manager (Data, Analytics & AI)

@ Databricks | Stockholm, Sweden

Lecturer / Senior Lecturer - Medical Imaging

@ Central Queensland University | Mackay, QLD, AU

Intern - Research Engineer

@ Plus | Santa Clara, CA