April 21, 2024, 2 p.m. | code_your_own_AI

code_your_own_AI www.youtube.com

A brand new Language Model Architecture: RecurrentLLM. Moving Past Transformers.
Google developed RecurrentGemma-2B and compares this new LM architecture (!) with the classical transformer based, quadratic complexity of a self-attention Gemma 2B. And the new throughput is: about 6000 tokens per second.

Introduction and Model Architecture:
The original paper by Google introduces "RecurrentGemma-2B," leveraging the Griffin architecture, which moves away from traditional global attention mechanisms in favor of a combination of linear recurrences and local attention. This design enables the …

architecture attention brand complexity context gemma gen gen ai google introduction language language model moving next next-gen paper per self-attention tokens transformer transformers

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States