all AI news
Global Momentum Compression for Sparse Communication in Distributed Learning
April 4, 2024, 4:42 a.m. | Chang-Wei Shi, Shen-Yi Zhao, Yin-Peng Xie, Hao Gao, Wu-Jun Li
cs.LG updates on arXiv.org arxiv.org
Abstract: With the rapid growth of data, distributed momentum stochastic gradient descent~(DMSGD) has been widely used in distributed learning, especially for training large-scale deep models. Due to the latency and limited bandwidth of the network, communication has become the bottleneck of distributed learning. Communication compression with sparsified gradient, abbreviated as \emph{sparse communication}, has been widely employed to reduce communication cost. All existing works about sparse communication in DMSGD employ local momentum, in which the momentum only …
abstract arxiv bandwidth become communication compression cs.lg data distributed distributed learning global gradient growth latency network scale stat.ml stochastic training type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
MLOps Engineer - Hybrid Intelligence
@ Capgemini | Madrid, M, ES
Analista de Business Intelligence (Industry Insights)
@ NielsenIQ | Cotia, Brazil