Web: http://arxiv.org/abs/2206.11357

June 24, 2022, 1:10 a.m. | Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Ch

cs.LG updates on arXiv.org arxiv.org

Training large neural network (NN) models requires extensive memory
resources, and Activation Compressed Training (ACT) is a promising approach to
reduce training memory footprint. This paper presents GACT, an ACT framework to
support a broad range of machine learning tasks for generic NN architectures
with limited domain knowledge. By analyzing a linearized version of ACT's
approximate gradient, we prove the convergence of GACT without prior knowledge
on operator type or model architecture. To make training stable, we propose an
algorithm …

arxiv general lg training

