Increasing Training Loss | allainews.com

March 31, 2024, 3:41 p.m. | /u/toroidmax

Deep Learning www.reddit.com

I was trying to replicate results from [Grokking paper](https://arxiv.org/abs/2201.02177). As per the paper, if an over-parameterised neural net is trained beyond over-fitting, it starts generalising. I used [nanoGPT](https://github.com/karpathy/ng-video-lecture) from Andrej Karpathy for this experiment. In experiment 1 \[Grok-0\], the model started over-fitting after \~70 steps. You can see val loss \[in grey\] increasing while train loss going down to zero. However the val loss never deceased.

For experiment 2 \[Grok-1\], I increased model size \[embed dim and number of blocks\]. …

deeplearning embed experiment grok grok-1 loss train

More from www.reddit.com / Deep Learning

Prerequisites for jumping into transformers? 11 hours ago | www.reddit.com

basics cnns concepts deep learning +11

[Reading] Deeplearning by goodfellow 17 hours ago | www.reddit.com

alternative assessment bayesian change +9

Best way to make a deep learning model that is an expert in a niche? 1 day, 8 hours ago | www.reddit.com

analytics building deep learning deeplearning +8

Linearizing Large Language Models 1 day, 9 hours ago | www.reddit.com

data deeplearning mistral rnn +2

Converting Soft tokens to Hard tokens in Llama2 1 day, 11 hours ago | www.reddit.com

concrete deeplearning embeddings good +9

Detection of free parking spaces 1 day, 18 hours ago | www.reddit.com

big deeplearning detection developer +8

Language model for TimeSeries Forecasting from Amazon 3 days ago | www.reddit.com

amazon architecture challenge cnn +21

Model into application 3 days, 10 hours ago | www.reddit.com

application apps basic build +11

Why GPU is not utilised in training in colab 3 days, 21 hours ago | www.reddit.com

colab deep learning deeplearning free +4

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net