Oct. 12, 2023, 5:59 p.m. | Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov

Blog Content - TOGETHER www.together.xyz

We present a technique, Flash-Decoding, that significantly speeds up
attention during inference, bringing up to 8x faster generation for very
long sequences.

attention context decoding faster flash inference research

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Reporting & Data Analytics Lead (Sizewell C)

@ EDF | London, GB

Data Analyst

@ Notable | San Mateo, CA