Aug. 30, 2022, 1:11 a.m. | Haoxiang Wang, Zhanhong Jiang, Chao Liu, Soumik Sarkar, Dongxiang Jiang, Young M. Lee

cs.LG updates on arXiv.org arxiv.org

In the context of distributed deep learning, the issue of stale weights or
gradients could result in poor algorithmic performance. This issue is usually
tackled by delay tolerant algorithms with some mild assumptions on the
objective functions and step sizes. In this paper, we propose a different
approach to develop a new algorithm, called $\textbf{P}$redicting
$\textbf{C}$lipping $\textbf{A}$synchronous $\textbf{S}$tochastic
$\textbf{G}$radient $\textbf{D}$escent (aka, PC-ASGD). Specifically, PC-ASGD
has two steps - the $\textit{predicting step}$ leverages the gradient
prediction using Taylor expansion to reduce …

arxiv asynchronous distributed learning time training

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US