Web: http://arxiv.org/abs/2205.05040

May 11, 2022, 1:11 a.m. | Mingrui Liu, Zhenxun Zhuang, Yunwei Lei, Chunyang Liao

cs.LG updates on arXiv.org arxiv.org

In distributed training of deep neural networks or Federated Learning (FL),
people usually run Stochastic Gradient Descent (SGD) or its variants on each
machine and communicate with other machines periodically. However, SGD might
converge slowly in training some deep neural networks (e.g., RNN, LSTM) because
of the exploding gradient issue. Gradient clipping is usually employed to
address this issue in the single machine setting, but exploring this technique
in the FL setting is still in its infancy: it remains mysterious …

algorithm arxiv communication deep distributed gradient networks neural neural networks training

More from arxiv.org / cs.LG updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California