April 16, 2024, noon | code_your_own_AI

code_your_own_AI www.youtube.com

Ring Attention enables context lengths of 1 mio tokens for our latest LLMs and VLMs. How is this possible? What happens to the quadratic complexity of self-attention on sequence length?

In this video, I explain the Block Parallel Transformer idea from UC Berkeley to the actual code implementation on Github for Ring Attention with blockwise transformer.

Current Google Gemini 1.5 Pro has a context length of 1 mio tokens on Vertex AI.

00:00 3 ways for infinite context lengths
02:05 …

attention berkeley block code complexity context explained github implementation llms ring self-attention tokens transformer uc berkeley video vlms

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US