March 31, 2024, 3:41 p.m. | /u/toroidmax

Deep Learning www.reddit.com

I was trying to replicate results from [Grokking paper](https://arxiv.org/abs/2201.02177). As per the paper, if an over-parameterised neural net is trained beyond over-fitting, it starts generalising. I used [nanoGPT](https://github.com/karpathy/ng-video-lecture) from Andrej Karpathy for this experiment. In experiment 1 \[Grok-0\], the model started over-fitting after \~70 steps. You can see val loss \[in grey\] increasing while train loss going down to zero. However the val loss never deceased.

For experiment 2 \[Grok-1\], I increased model size \[embed dim and number of blocks\]. …

deeplearning embed experiment grok grok-1 loss train

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote