June 4, 2024, noon | code_your_own_AI

code_your_own_AI www.youtube.com

Grokking is a new phase in the performance of LLMs. Starting with arithmetic operations, we analyze the patterns in the embedded space of Transformers.

Grokking refers to a phenomenon where, after extensive training beyond typical saturation points, transformers can generalize effectively to unseen data, achieving high performance long after initial overfitting occurs. This discovery challenges conventional wisdom about early stopping to prevent overfitting, revealing that extended training can lead to superior generalization. The video highlights various studies demonstrating this effect, …

advanced analyze beyond challenges data discovery embedded llm llms operations overfitting patterns performance space training transformers

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Senior Data Engineer

@ Displate | Warsaw

Solutions Engineer

@ Stability AI | United States

Lead BizOps Engineer

@ Mastercard | O'Fallon, Missouri (Main Campus)

Senior Solution Architect

@ Cognite | Kuala Lumpur

Senior Front-end Engineer

@ Cognite | Bengaluru