Web: http://arxiv.org/abs/2205.05198

May 12, 2022, 1:11 a.m. | Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro

cs.LG updates on arXiv.org arxiv.org

Training large transformer models is one of the most important computational
challenges of modern AI. In this paper, we show how to significantly accelerate
training of large transformer models by reducing activation recomputation.
Activation recomputation is commonly used to work around memory capacity
constraints. Rather than storing activations for backpropagation, they are
traditionally recomputed, which saves memory but adds redundant compute. In
this work, we show most of this redundant compute is unnecessary because we can
reduce memory consumption sufficiently …

arxiv models transformer transformer models

More from arxiv.org / cs.LG updates on arXiv.org

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX

Faculty members in Research areas such as Bayesian and Spatial Statistics; Data Privacy and Security; AI/ML; NLP; Image and Video Data Analysis

@ Ahmedabad University | Ahmedabad, India

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC