all AI news
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
March 6, 2024, 5:43 a.m. | Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma
cs.LG updates on arXiv.org arxiv.org
Abstract: Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the …
abstract adam algorithm art arxiv cost cs.cl cs.lg improvement language language model massive material math.oc optimization pre-training scalable sophia state stochastic training type variants
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US