Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training | allainews.com

May 31, 2023, 10:55 a.m. | /u/shreyansh26

Artificial Intelligence www.reddit.com

Wrote up a blog post on the new second-order optimizer Sophia, which is showing encouraging results on LLM pretraining.

This paper has some good use of advanced optimization theory, the resources for which I have included in my blog.

Blog - [https://shreyansh26.github.io/post/2023-05-28\_sophia\_scalable\_second\_order\_optimizer\_llms/](https://shreyansh26.github.io/post/2023-05-28_sophia_scalable_second_order_optimizer_llms/)

Annotated Paper - [Sophia Annotated Paper - Github](https://github.com/shreyansh26/Annotated-ML-Papers/blob/main/ML%20Theory/Sophia%20-%20A%20Scalable%20Stochastic%20Second-order%20Optimizer%20for%20Language%20Model%20Pretraining.pdf)

advanced artificial blog good language language model llm optimization paper pre-training resources scalable stochastic theory training

More from www.reddit.com / Artificial Intelligence

One-Minute Daily AI News 4/24/2024 7 hours ago | www.reddit.com

adobe ai news ai stack artificial +22

Why the AI Industry’s Thirst for New Data Centers Can’t Be Satisfied 16 hours ago | www.reddit.com

ai industry artificial data data centers +1

Microsoft Makes a New Push Into Smaller A.I. Systems 17 hours ago | www.reddit.com

artificial microsoft systems

Researchers use AI to edit human DNA 18 hours ago | www.reddit.com

artificial berkeley claim crispr +10

In this game, you type what you want to build. (Made with ChatGPT and Dall-E.) 22 hours ago | www.reddit.com

artificial build chatgpt dall +3

Meta AI is fully cooked. Thinks it's a human called James Baker 1 day, 1 hour ago | www.reddit.com

artificial baker hallucination hospital +5

One-Minute Daily AI News 4/23/2024 1 day, 6 hours ago | www.reddit.com

ai news ai search android artificial +18

Every month I pay for a different LLM. What should I try next? 1 day, 13 hours ago | www.reddit.com

advanced analyze artificial claude +13

PETS (Paranormal Ethereal Tank Spirits) 1 day, 16 hours ago | www.reddit.com

artificial ethereal paranormal pets

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Alternant Data Engineering

@ Aspire Software | Angers, FR

View on ai-jobs.net

Senior Software Engineer, Generative AI

@ Google | Dublin, Ireland

View on ai-jobs.net