Flash-Decoding for long-context inference

Oct. 12, 2023, 5:59 p.m. | Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov

Blog Content - TOGETHER www.together.xyz

We present a technique, Flash-Decoding, that significantly speeds up
attention during inference, bringing up to 8x faster generation for very
long sequences.

attention context decoding faster flash inference research

Visit resource

More from www.together.xyz / Blog Content - TOGETHER

RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models 6 months, 2 weeks ago | www.together.xyz

annotations data data quality dataset +12

Flash-Decoding for long-context inference 7 months, 1 week ago | www.together.xyz

attention context decoding faster +3

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads 8 months, 1 week ago | www.together.xyz

api context decoding framework +6

Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API 9 months ago | www.together.xyz

api context fine-tuning llama +2

Faster inference enables up to 5x price reduction on Together API 9 months, 1 week ago | www.together.xyz

ai stack api cost efficiency +12

Preparing for the era of 32K context: Early learnings and explorations 9 months, 3 weeks ago | www.together.xyz

context document understanding llama research +2

Monarch Mixer: A new model architecture for increased efficiency 9 months, 3 weeks ago | www.together.xyz

architecture efficiency exploration look +2

Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model … 10 months ago | www.together.xyz

ai models dao fine-tuning inference +5

Together AI and Snorkel AI empower enterprises to build proprietary LLMs 10 months ago | www.together.xyz

build data enterprises environments +5

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

all AI news

Flash-Decoding for long-context inference

More from www.together.xyz / Blog Content - TOGETHER

Jobs in AI, ML, Big Data

Software Engineer for AI Training Data (School Specific)

Software Engineer for AI Training Data (Python)

Software Engineer for AI Training Data (Tier 2)

Data Engineer

Artificial Intelligence – Bioinformatic Expert

Lead Developer (AI)