March 28, 2024, 4:41 a.m. | Sokratis J. Anagnostopoulos, Juan Diego Toscano, Nikolaos Stergiopulos, George Em Karniadakis

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.18494v1 Announce Type: new
Abstract: We investigate the learning dynamics of fully-connected neural networks through the lens of gradient signal-to-noise ratio (SNR), examining the behavior of first-order optimizers like Adam in non-convex objectives. By interpreting the drift/diffusion phases in the information bottleneck theory, focusing on gradient homogeneity, we identify a third phase termed ``total diffusion", characterized by equilibrium in the learning rates and homogeneous gradients. This phase is marked by an abrupt SNR increase, uniform residuals across the sample space …

abstract adam arxiv behavior cs.lg diffusion drift dynamics gradient identify information networks neural networks noise signal the information theory through total transition type

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US