Huggingface not saving model checkpoint | allainews.com

April 28, 2023, 1:37 p.m. | /u/Tiny-Entertainer-346

Natural Language Processing www.reddit.com

I am trying to train T5 model. This is how my training arguments look like:

args = Seq2SeqTrainingArguments(
model_dir,
evaluation_strategy="steps",
eval_steps=100,
logging_strategy="steps",
logging_steps=100,
save_strategy="steps",
save_steps=200,
learning_rate=4e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=10,
predict_with_generate=True,
fp16=True,
load_best_model_at_end=True,
metric_for_best_model="rouge1",
report_to="tensorboard"
)

My model trained for 7600 steps. But the last model saved was for checkpoint 1800:

[trainer screenshot](https://i.stack.imgur.com/MBoFu.png)

Why is this so?

fp16 huggingface languagetechnology look saving tensorboard training true

More from www.reddit.com / Natural Language Processing

Do Llamas Work in English? On the Latent Language of Multilingual Transformers 2 days, 5 hours ago | www.reddit.com

abstract bias colab english +19

How does the creative behavior of small models inform our understanding of the creative behavior … 3 days, 6 hours ago | www.reddit.com

creativity good information languagetechnology +8

Do I need graph database for this Entity Linking problem? 5 days, 6 hours ago | www.reddit.com

articles build business companies +14

Recommendation on NLP-tools and algorithms for modelling diachronic change in meaning? 6 days, 14 hours ago | www.reddit.com

algorithms change focus hello +11

What can I do during my NLP Master's program to best prepare me for top … 1 week ago | www.reddit.com

computer computer science languagetechnology master +4

Alternatives to Rasa? 1 week, 3 days ago | www.reddit.com

alternative chatbots database document +8

Can LLMs Consistently Deliver Comedy? 1 week, 3 days ago | www.reddit.com

comedy create filtering however +9

Topic modeling with short sentences 1 week, 4 days ago | www.reddit.com

algorithms data dataset kind +4

How big does a dataset have to be to fine-tune a transformer model for NER. 1 week, 5 days ago | www.reddit.com

bert big database dataset +15

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net