March 31, 2024, 3:41 p.m. | /u/toroidmax

Deep Learning www.reddit.com

I was trying to replicate results from [Grokking paper](https://arxiv.org/abs/2201.02177). As per the paper, if an over-parameterised neural net is trained beyond over-fitting, it starts generalising. I used [nanoGPT](https://github.com/karpathy/ng-video-lecture) from Andrej Karpathy for this experiment. In experiment 1 \[Grok-0\], the model started over-fitting after \~70 steps. You can see val loss \[in grey\] increasing while train loss going down to zero. However the val loss never deceased.

For experiment 2 \[Grok-1\], I increased model size \[embed dim and number of blocks\]. …

deeplearning embed experiment grok grok-1 loss train

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US