March 6, 2024, 5:43 a.m. | Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma

cs.LG updates on arXiv.org arxiv.org

arXiv:2305.14342v4 Announce Type: replace
Abstract: Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the …

abstract adam algorithm art arxiv cost cs.cl cs.lg improvement language language model massive material math.oc optimization pre-training scalable sophia state stochastic training type variants

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US