CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks | allainews.com

April 24, 2023, 11:17 p.m. | Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang

Blog Content - TOGETHER www.together.xyz

Distributed training of foundation models, especially large language models
(LLMs), is communication-intensive and so has heavily relied on centralized
data centers with fast interconnects. Can we train on slow networks and
unlock the potential of decentralized infrastructure for foundation models?
In this paper, we propose CocktailSGD, a novel communication-efficient
training framework that combines three distinct compression techniques --
random sparsification, top-K sparsification, and quantization -- to achieve
much greater compression than each individual technique alone. We justify
the benefit of …

centralized data communication data data centers decentralized distributed fine-tuning foundation framework infrastructure interconnects language language models large language large language models llms networks novel paper research training

More from www.together.xyz / Blog Content - TOGETHER

RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models 6 months ago | www.together.xyz

annotations data data quality dataset +12

Flash-Decoding for long-context inference 6 months, 2 weeks ago | www.together.xyz

attention context decoding faster +3

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads 7 months, 2 weeks ago | www.together.xyz

api context decoding framework +6

Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API 8 months, 1 week ago | www.together.xyz

api context fine-tuning llama +2

Faster inference enables up to 5x price reduction on Together API 8 months, 2 weeks ago | www.together.xyz

ai stack api cost efficiency +12

Preparing for the era of 32K context: Early learnings and explorations 9 months ago | www.together.xyz

context document understanding llama research +2

Monarch Mixer: A new model architecture for increased efficiency 9 months ago | www.together.xyz

architecture efficiency exploration look +2

Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model … 9 months, 2 weeks ago | www.together.xyz

ai models dao fine-tuning inference +5

Together AI and Snorkel AI empower enterprises to build proprietary LLMs 9 months, 2 weeks ago | www.together.xyz

build data enterprises environments +5

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Sr. VBI Developer II

@ Atos | Texas, US, 75093

View on ai-jobs.net

Wealth Management - Data Analytics Intern/Co-op Fall 2024

@ Scotiabank | Toronto, ON, CA

View on ai-jobs.net