March 10, 2024, 6:12 p.m. | /u/Wats0ns

Machine Learning www.reddit.com

Hello,

I've been reading explanations of Lora for hours now, and there is something I can't wrap my head around: the memory gains. I understand that a lot is gained with the optimizer state that are not needed for frozen layers.

However, in the Lora Paper (Chapter 4.2) it is stated that

>We also observe a 25% speedup during training on GPT-3 175B compared to full fine-tuning5 as we do not need to calculate the gradient for the vast majority …

head hello however lora machinelearning memory paper reading something state

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York