April 16, 2024, noon | code_your_own_AI

code_your_own_AI www.youtube.com

Ring Attention enables context lengths of 1 mio tokens for our latest LLMs and VLMs. How is this possible? What happens to the quadratic complexity of self-attention on sequence length?

In this video, I explain the Block Parallel Transformer idea from UC Berkeley to the actual code implementation on Github for Ring Attention with blockwise transformer.

Current Google Gemini 1.5 Pro has a context length of 1 mio tokens on Vertex AI.

00:00 3 ways for infinite context lengths
02:05 …

attention berkeley block code complexity context explained github implementation llms ring self-attention tokens transformer uc berkeley video vlms

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

C003549 Data Analyst (NS) - MON 13 May

@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium

Marketing Decision Scientist

@ Meta | Menlo Park, CA | New York City