March 18, 2024, 4:42 a.m. | Markus Gross, Arne P. Raulf, Christoph R\"ath

cs.LG updates on arXiv.org arxiv.org

arXiv:2311.14120v2 Announce Type: replace
Abstract: We investigate the stationary (late-time) training regime of single- and two-layer linear underparameterized neural networks within the continuum limit of stochastic gradient descent (SGD) for synthetic Gaussian data. In the case of a single-layer network in the weakly underparameterized regime, the spectrum of the noise covariance matrix deviates notably from the Hessian, which can be attributed to the broken detailed balance of SGD dynamics. The weight fluctuations are in this case generally anisotropic, but are …

abstract arxiv case cond-mat.dis-nn cond-mat.stat-mech cs.lg data derivation gradient layer linear network networks neural networks stochastic synthetic training type variance

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-

@ JPMorgan Chase & Co. | Wilmington, DE, United States

Senior ML Engineer (Speech/ASR)

@ ObserveAI | Bengaluru