Oct. 7, 2022, 1:14 a.m. | Jean Kaddour

stat.ML updates on arXiv.org arxiv.org

Training vision or language models on large datasets can take days, if not
weeks. We show that averaging the weights of the k latest checkpoints, each
collected at the end of an epoch, can speed up the training progression in
terms of loss and accuracy by dozens of epochs, corresponding to time savings
up to ~68 and ~30 GPU hours when training a ResNet50 on ImageNet and
RoBERTa-Base model on WikiText-103, respectively. We also provide the code and
model checkpoint …

arxiv bert imagenet saving training

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

AI Scientist/Engineer

@ OKX | Singapore

Research Engineering/ Scientist Associate I

@ The University of Texas at Austin | AUSTIN, TX

Senior Data Engineer

@ Algolia | London, England

Fundamental Equities - Vice President, Equity Quant Research Analyst (Income & Value Investment Team)

@ BlackRock | NY7 - 50 Hudson Yards, New York

Snowflake Data Analytics

@ Devoteam | Madrid, Spain