all AI news
Reducing Activation Recomputation in Large Transformer Models. (arXiv:2205.05198v1 [cs.LG])
Web: http://arxiv.org/abs/2205.05198
May 12, 2022, 1:10 a.m. | Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro
cs.CL updates on arXiv.org arxiv.org
Training large transformer models is one of the most important computational
challenges of modern AI. In this paper, we show how to significantly accelerate
training of large transformer models by reducing activation recomputation.
Activation recomputation is commonly used to work around memory capacity
constraints. Rather than storing activations for backpropagation, they are
traditionally recomputed, which saves memory but adds redundant compute. In
this work, we show most of this redundant compute is unnecessary because we can
reduce memory consumption sufficiently …
More from arxiv.org / cs.CL updates on arXiv.org
The Budge programming language. (arXiv:2205.07979v2 [cs.PL] UPDATED)
2 days, 6 hours ago |
arxiv.org
Latest AI/ML/Big Data Jobs
Data Analyst, Patagonia Action Works
@ Patagonia | Remote
Data & Insights Strategy & Innovation General Manager
@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX
Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis
@ Ahmedabad University | Ahmedabad, India
Director, Applied Mathematics & Computational Research Division
@ Lawrence Berkeley National Lab | Berkeley, Ca
Business Data Analyst
@ MainStreet Family Care | Birmingham, AL
Assistant/Associate Professor of the Practice in Business Analytics
@ Georgetown University McDonough School of Business | Washington DC