May 31, 2023, 9:03 a.m. | /u/Balance-

Machine Learning www.reddit.com

This paper presents a memory-efficient zeroth-order optimizer (MeZO) for fine-tuning language models (LMs). As LMs grow larger, backpropagation becomes computationally costly, requiring large amounts of memory. MeZO adapts the classical Zeroth-order Stochastic Gradient Descent (ZO-SGD) method to operate in-place, enabling fine-tuning of LMs with the same memory footprint as inference.

For instance, with a single A100 80GB GPU, MeZO can train a 30-billion parameter model, whereas fine-tuning with backpropagation can only train a 2.7-billion parameter LM with the same resources. …

backpropagation enabling fine-tuning gradient inference language language models machinelearning memory paper stochastic

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Enterprise AI Architect

@ Oracle | Broomfield, CO, United States

Cloud Data Engineer France H/F (CDI - Confirmé)

@ Talan | Nantes, France