April 4, 2024, 4:42 a.m. | Chang-Wei Shi, Shen-Yi Zhao, Yin-Peng Xie, Hao Gao, Wu-Jun Li

cs.LG updates on arXiv.org arxiv.org

arXiv:1905.12948v3 Announce Type: replace-cross
Abstract: With the rapid growth of data, distributed momentum stochastic gradient descent~(DMSGD) has been widely used in distributed learning, especially for training large-scale deep models. Due to the latency and limited bandwidth of the network, communication has become the bottleneck of distributed learning. Communication compression with sparsified gradient, abbreviated as \emph{sparse communication}, has been widely employed to reduce communication cost. All existing works about sparse communication in DMSGD employ local momentum, in which the momentum only …

abstract arxiv bandwidth become communication compression cs.lg data distributed distributed learning global gradient growth latency network scale stat.ml stochastic training type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

MLOps Engineer - Hybrid Intelligence

@ Capgemini | Madrid, M, ES

Analista de Business Intelligence (Industry Insights)

@ NielsenIQ | Cotia, Brazil