Oct. 12, 2023, 5:59 p.m. | Tri Dao, Daniel Haziza, Francisco Massa, Grigory Sizov

Blog Content - TOGETHER www.together.xyz

We present a technique, Flash-Decoding, that significantly speeds up
attention during inference, bringing up to 8x faster generation for very
long sequences.

attention context decoding faster flash inference research

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US