March 1, 2024, 4:28 a.m. | /u/we_are_mammals

Machine Learning www.reddit.com

[https://arxiv.org/abs/2402.19427](https://arxiv.org/abs/2402.19427)

**Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models**

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama-2 despite being trained on over …

attention griffin hybrid inference language language models linear local attention machinelearning networks neural networks recurrent neural networks rnn scale train

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne