[D] Strange Loss Curve while training | allainews.com

May 8, 2024, 4:41 p.m. | /u/ApartmentEither4838

Machine Learning www.reddit.com

https://preview.redd.it/z5wmyi0nb8zc1.png?width=599&format=png&auto=webp&s=97e108bd749f9cf0874759f7ba0b8aafb3260640

Today I was training a small (11.07 Million) parameter GPT model on text dataset and I came across this loss curve while training, is there any explaination as to why the loss first plataeus around 2.4 and then starts to exponentially fall there after? Also why is there a sudden spike in between at around 1200 steps?

- The dataset I am using is the entire novel "one hundred years of solitude"
- The total token count in the …

dataset gpt loss machinelearning small text training while

More from www.reddit.com / Machine Learning

[D] Does DSPy actually change the LM weights? 4 hours ago | www.reddit.com

change dspy engineering machinelearning +2

[D] How did OpenAI go from doing exciting research to a big-tech-like company? 4 hours ago | www.reddit.com

capabilities engineering fast forward gpt4 +6

Multimodal AI from First Principles - Most fundamental approaches [D] 5 hours ago | www.reddit.com

building fundamental machinelearning multimodal +4

[D] Culture of Recycling Old Conference Submissions in ML 7 hours ago | www.reddit.com

conference conferences culture iclr +10

[D] How Do You Efficiently Conduct Ablation Studies in Machine Learning? 7 hours ago | www.reddit.com

fine-tuning grid insights machine +7

[P] N-way-attention 11 hours ago | www.reddit.com

algorithm attention concept every +12

[D] Is it possible to train ViTMAE with Hyperspectral Satellite Images? 22 hours ago | www.reddit.com

encoder format images learn +4

[D] Mamba Convergence speed 1 day, 1 hour ago | www.reddit.com

class convergence dataset example +10

[P] Local RAG with RETSim, Ollama and Gemma 1 day, 3 hours ago | www.reddit.com

gemma machinelearning notebooks ollama +3

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net