all AI news
Huggingface not saving model checkpoint
April 28, 2023, 1:37 p.m. | /u/Tiny-Entertainer-346
Natural Language Processing www.reddit.com
args = Seq2SeqTrainingArguments(
model_dir,
evaluation_strategy="steps",
eval_steps=100,
logging_strategy="steps",
logging_steps=100,
save_strategy="steps",
save_steps=200,
learning_rate=4e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=10,
predict_with_generate=True,
fp16=True,
load_best_model_at_end=True,
metric_for_best_model="rouge1",
report_to="tensorboard"
)
My model trained for 7600 steps. But the last model saved was for checkpoint 1800:
[trainer screenshot](https://i.stack.imgur.com/MBoFu.png)
Why is this so?
fp16 huggingface languagetechnology look saving tensorboard training true
More from www.reddit.com / Natural Language Processing
Do I need graph database for this Entity Linking problem?
5 days, 6 hours ago |
www.reddit.com
Can LLMs Consistently Deliver Comedy?
1 week, 3 days ago |
www.reddit.com
Topic modeling with short sentences
1 week, 4 days ago |
www.reddit.com
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US