March 6, 2024, 5:43 a.m. | Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma

cs.LG updates on arXiv.org arxiv.org

arXiv:2305.14342v4 Announce Type: replace
Abstract: Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the …

abstract adam algorithm art arxiv cost cs.cl cs.lg improvement language language model massive material math.oc optimization pre-training scalable sophia state stochastic training type variants

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Principal, Product Strategy Operations, Cloud Data Analytics

@ Google | Sunnyvale, CA, USA; Austin, TX, USA

Data Scientist - HR BU

@ ServiceNow | Hyderabad, India