Jan. 31, 2022, 2:11 a.m. | Yucheng Lu, Christopher De Sa

Decentralization is a promising method of scaling up parallel machine
learning systems. In this paper, we provide a tight lower bound on the
iteration complexity for such methods in a stochastic non-convex setting. Our
lower bound reveals a theoretical gap in known convergence rates of many
existing decentralized training algorithms, such as D-PSGD. We prove by
construction this lower bound is tight and achievable. Motivated by our
insights, we further propose DeTAG, a practical gossip-style decentralized
algorithm that achieves the …

