Web: http://arxiv.org/abs/2201.12052

Jan. 31, 2022, 2:11 a.m. | Bartłomiej Polaczyk, Jacek Cyranka

cs.LG updates on arXiv.org arxiv.org

We study the overparametrization bounds required for the global convergence
of stochastic gradient descent algorithm for a class of one hidden layer
feed-forward neural networks, considering most of the activation functions used
in practice, including ReLU. We improve the existing state-of-the-art results
in terms of the required hidden layer width. We introduce a new proof technique
combining nonlinear analysis with properties of random initializations of the
network. First, we establish the global convergence of continuous solutions of
the differential inclusion …

arxiv convergence global gradient networks neural neural networks stochastic

More from arxiv.org / cs.LG updates on arXiv.org

Senior Data Analyst

@ Fanatics Inc | Remote - New York

Data Engineer - Search

@ Cytora | United Kingdom - Remote

Product Manager, Technical - Data Infrastructure and Streaming

@ Nubank | Berlin

Postdoctoral Fellow: ML for autonomous materials discovery

@ Lawrence Berkeley National Lab | Berkeley, CA

Principal Data Scientist

@ Zuora | Remote

Data Engineer

@ Veeva Systems | Pennsylvania - Fort Washington