Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Sept. 11, 2023, 4:27 p.m. | Tianle Cai*, Yuhong Li*, Zhengyang Geng, Hongwu Peng, Tri Dao (* Equal contribution)

Blog Content - TOGETHER www.together.xyz

Announcing LLaMA-2-7B-32K-Instruct, a long-context instruction model
fine-tuned using Together API.

api context decoding framework llama llm multiple research simple together

More from www.together.xyz / Blog Content - TOGETHER

RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models 5 months, 3 weeks ago | www.together.xyz

annotations data data quality dataset +12

Flash-Decoding for long-context inference 6 months, 2 weeks ago | www.together.xyz

attention context decoding faster +3

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads 7 months, 2 weeks ago | www.together.xyz

api context decoding framework +6

Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API 8 months, 1 week ago | www.together.xyz

api context fine-tuning llama +2

Faster inference enables up to 5x price reduction on Together API 8 months, 2 weeks ago | www.together.xyz

ai stack api cost efficiency +12

Preparing for the era of 32K context: Early learnings and explorations 8 months, 4 weeks ago | www.together.xyz

context document understanding llama research +2

Monarch Mixer: A new model architecture for increased efficiency 9 months ago | www.together.xyz

architecture efficiency exploration look +2

Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model … 9 months, 1 week ago | www.together.xyz

ai models dao fine-tuning inference +5

Together AI and Snorkel AI empower enterprises to build proprietary LLMs 9 months, 1 week ago | www.together.xyz

build data enterprises environments +5

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Senior ML Engineer

@ Carousell Group | Ho Chi Minh City, Vietnam

View on ai-jobs.net

Data and Insight Analyst

@ Cotiviti | Remote, United States

View on ai-jobs.net

View more jobs

all AI news

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

More from www.together.xyz / Blog Content - TOGETHER

Jobs in AI, ML, Big Data

Data Architect

Data ETL Engineer

Lead GNSS Data Scientist

Senior Machine Learning Engineer (MLOps)

Senior ML Engineer

Data and Insight Analyst