all AI news
A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks. (arXiv:2205.05040v1 [cs.LG])
Web: http://arxiv.org/abs/2205.05040
cs.LG updates on arXiv.org arxiv.org
In distributed training of deep neural networks or Federated Learning (FL),
people usually run Stochastic Gradient Descent (SGD) or its variants on each
machine and communicate with other machines periodically. However, SGD might
converge slowly in training some deep neural networks (e.g., RNN, LSTM) because
of the exploding gradient issue. Gradient clipping is usually employed to
address this issue in the single machine setting, but exploring this technique
in the FL setting is still in its infancy: it remains mysterious …
algorithm arxiv communication deep distributed gradient networks neural neural networks training