all AI news
Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning
Feb. 27, 2024, 5:41 a.m. | Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You
cs.LG updates on arXiv.org arxiv.org
Abstract: While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, the quality of gradient estimates in zeroth order optimization often depends on the data dimensionality, potentially explaining why MeZO still exhibits significant performance drops compared to standard …
abstract arxiv cost cs.ai cs.cl cs.lg fine-tuning gradient issue language language models large language large language models llm llms making memory parameters performance propagation results specific tasks tasks training type
More from arxiv.org / cs.LG updates on arXiv.org
Efficient Data-Driven MPC for Demand Response of Commercial Buildings
2 days, 14 hours ago |
arxiv.org
Testing the Segment Anything Model on radiology data
2 days, 14 hours ago |
arxiv.org
Calorimeter shower superresolution
2 days, 14 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US