Feb. 8, 2024, 5:43 a.m. | Hugo Cui Luca Pesce Yatin Dandi Florent Krzakala Yue M. Lu Lenka Zdeborov\'a Bruno Loureiro

cs.LG updates on arXiv.org arxiv.org

In this manuscript we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging a connection from (Ba et al., 2022) with a non-linear spiked matrix model and recent progress on Gaussian universality (Dandi et al., 2023), we provide an exact asymptotic description of the generalization error in the high-dimensional limit where the number of samples $n$, the width $p$ and the …

cond-mat.dis-nn cs.lg data feature features gradient kernel layer learn linear matrix networks neural networks non-linear progress stat.ml

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Analyst

@ Alstom | Johannesburg, GT, ZA