Jan. 20, 2022, 2:10 a.m. | Shigang Li, Torsten Hoefler

cs.LG updates on arXiv.org arxiv.org

Communication overhead is one of the major obstacles to train large deep
learning models at scale. Gradient sparsification is a promising technique to
reduce the communication volume. However, it is very challenging to obtain real
performance improvement because of (1) the difficulty of achieving an scalable
and efficient sparse allreduce algorithm and (2) the sparsification overhead.
This paper proposes O$k$-Top$k$, a scheme for distributed training with sparse
gradients. O$k$-Top$k$ integrates a novel sparse allreduce algorithm (less than
6$k$ communication volume …

arxiv deep learning distributed learning

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst

@ Aviva | UK - Norwich - Carrara - 1st Floor

Werkstudent im Bereich Performance Engineering mit Computer Vision (w/m/div.) - anteilig remote

@ Bosch Group | Stuttgart, Lollar, Germany

Applied Research Scientist - NLP (Senior)

@ Snorkel AI | Hybrid / San Francisco, CA

Associate Principal Engineer, Machine Learning

@ Nagarro | Remote, India