all AI news
Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation
March 18, 2024, 4:42 a.m. | Markus Gross, Arne P. Raulf, Christoph R\"ath
cs.LG updates on arXiv.org arxiv.org
Abstract: We investigate the stationary (late-time) training regime of single- and two-layer linear underparameterized neural networks within the continuum limit of stochastic gradient descent (SGD) for synthetic Gaussian data. In the case of a single-layer network in the weakly underparameterized regime, the spectrum of the noise covariance matrix deviates notably from the Hessian, which can be attributed to the broken detailed balance of SGD dynamics. The weight fluctuations are in this case generally anisotropic, but are …
abstract arxiv case cond-mat.dis-nn cond-mat.stat-mech cs.lg data derivation gradient layer linear network networks neural networks stochastic synthetic training type variance
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-
@ JPMorgan Chase & Co. | Wilmington, DE, United States
Senior ML Engineer (Speech/ASR)
@ ObserveAI | Bengaluru