Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training | allainews.com

May 23, 2023, 4:29 p.m. | Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma

Blog Content - TOGETHER www.together.xyz

Given the massive cost of language model pre-training, a non-trivial
improvement of the optimization algorithm would lead to a material
reduction on the time and cost of training. Adam and its variants have been
state-of-the-art for years, and more sophisticated second-order
(Hessian-based) optimizers often incur too much per-step overhead.

adam algorithm art cost improvement language language model massive material optimization per pre-training research scalable state stochastic training variants

More from www.together.xyz / Blog Content - TOGETHER

RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models 6 months ago | www.together.xyz

annotations data data quality dataset +12

Flash-Decoding for long-context inference 6 months, 2 weeks ago | www.together.xyz

attention context decoding faster +3

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads 7 months, 2 weeks ago | www.together.xyz

api context decoding framework +6

Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API 8 months, 1 week ago | www.together.xyz

api context fine-tuning llama +2

Faster inference enables up to 5x price reduction on Together API 8 months, 2 weeks ago | www.together.xyz

ai stack api cost efficiency +12

Preparing for the era of 32K context: Early learnings and explorations 9 months ago | www.together.xyz

context document understanding llama research +2

Monarch Mixer: A new model architecture for increased efficiency 9 months ago | www.together.xyz

architecture efficiency exploration look +2

Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model … 9 months, 2 weeks ago | www.together.xyz

ai models dao fine-tuning inference +5

Together AI and Snorkel AI empower enterprises to build proprietary LLMs 9 months, 2 weeks ago | www.together.xyz

build data enterprises environments +5

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

C003549 Data Analyst (NS) - MON 13 May

@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium

View on ai-jobs.net

Marketing Decision Scientist

@ Meta | Menlo Park, CA | New York City

View on ai-jobs.net