Web: http://arxiv.org/abs/2206.07638

June 16, 2022, 1:11 a.m. | Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth

cs.LG updates on arXiv.org arxiv.org

The existing analysis of asynchronous stochastic gradient descent (SGD)
degrades dramatically when any delay is large, giving the impression that
performance depends primarily on the delay. On the contrary, we prove much
better guarantees for the same asynchronous SGD algorithm regardless of the
delays in the gradients, depending instead just on the number of parallel
devices used to implement the algorithm. Our guarantees are strictly better
than the existing analyses, and we also argue that asynchronous SGD outperforms
synchronous minibatch …

arxiv math

More from arxiv.org / cs.LG updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY