all AI news
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
March 6, 2024, 5:43 a.m. | Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma
cs.LG updates on arXiv.org arxiv.org
Abstract: Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the …
abstract adam algorithm art arxiv cost cs.cl cs.lg improvement language language model massive material math.oc optimization pre-training scalable sophia state stochastic training type variants
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Principal, Product Strategy Operations, Cloud Data Analytics
@ Google | Sunnyvale, CA, USA; Austin, TX, USA
Data Scientist - HR BU
@ ServiceNow | Hyderabad, India