all AI news
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging. (arXiv:2209.14981v2 [cs.LG] UPDATED)
Oct. 7, 2022, 1:14 a.m. | Jean Kaddour
stat.ML updates on arXiv.org arxiv.org
Training vision or language models on large datasets can take days, if not
weeks. We show that averaging the weights of the k latest checkpoints, each
collected at the end of an epoch, can speed up the training progression in
terms of loss and accuracy by dozens of epochs, corresponding to time savings
up to ~68 and ~30 GPU hours when training a ResNet50 on ImageNet and
RoBERTa-Base model on WikiText-103, respectively. We also provide the code and
model checkpoint …
More from arxiv.org / stat.ML updates on arXiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
AI Scientist/Engineer
@ OKX | Singapore
Research Engineering/ Scientist Associate I
@ The University of Texas at Austin | AUSTIN, TX
Senior Data Engineer
@ Algolia | London, England
Fundamental Equities - Vice President, Equity Quant Research Analyst (Income & Value Investment Team)
@ BlackRock | NY7 - 50 Hudson Yards, New York
Snowflake Data Analytics
@ Devoteam | Madrid, Spain