Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning | allainews.com

Feb. 27, 2024, 5:41 a.m. | Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.15751v1 Announce Type: new
Abstract: While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, the quality of gradient estimates in zeroth order optimization often depends on the data dimensionality, potentially explaining why MeZO still exhibits significant performance drops compared to standard …

abstract arxiv cost cs.ai cs.cl cs.lg fine-tuning gradient issue language language models large language large language models llm llms making memory parameters performance propagation results specific tasks tasks training type

More from arxiv.org / cs.LG updates on arXiv.org

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior 11 minutes ago | arxiv.org

arxiv consistent cs.cv cs.lg +6

Machine-learned models for magnetic materials 11 minutes ago | arxiv.org

abstract arxiv autoencoder cond-mat.mtrl-sci +17

Revisiting RIP guarantees for sketching operators on mixture models 11 minutes ago | arxiv.org

abstract alternative analysis arxiv +9

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata 11 minutes ago | arxiv.org

abstract accuracy arxiv assessment +16

Getting More for Less: Using Weak Labels and AV-Mixup for Robust Audio-Visual Speaker Verification 11 minutes ago | arxiv.org

abstract arxiv audio cs.cv +18

Neural-network quantum state study of the long-range antiferromagnetic Ising chain 11 minutes ago | arxiv.org

abstract arxiv boltzmann cond-mat.quant-gas +12

Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on … 11 minutes ago | arxiv.org

abstract arxiv assumptions cs.lg +22

Vortex Feature Positioning: Bridging Tabular IIoT Data and Image-Based Deep Learning 11 minutes ago | arxiv.org

abstract arxiv cs.cv cs.lg +19

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret 11 minutes ago | arxiv.org

abstract algorithms arxiv attention +20

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

Customer Data Analyst with Spanish

@ Michelin | Voluntari

View on ai-jobs.net

HC Data Analyst - Senior

@ Leidos | 1662 Intelligence Community Campus - Bethesda MD

View on ai-jobs.net

Healthcare Research & Data Analyst- Infectious, Niche, Rare Disease

@ Clarivate | Remote (121- Massachusetts)

View on ai-jobs.net

Data Analyst (maternity leave cover)

@ Clarivate | R155-Belgrade

View on ai-jobs.net

Sales Enablement Data Analyst (Remote)

@ CrowdStrike | USA TX Remote

View on ai-jobs.net