all AI news
Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks. (arXiv:2201.12052v1 [cs.LG])
Web: http://arxiv.org/abs/2201.12052
cs.LG updates on arXiv.org arxiv.org
We study the overparametrization bounds required for the global convergence
of stochastic gradient descent algorithm for a class of one hidden layer
feed-forward neural networks, considering most of the activation functions used
in practice, including ReLU. We improve the existing state-of-the-art results
in terms of the required hidden layer width. We introduce a new proof technique
combining nonlinear analysis with properties of random initializations of the
network. First, we establish the global convergence of continuous solutions of
the differential inclusion …
arxiv convergence global gradient networks neural neural networks stochastic