all AI news
[R] Gated Linear Attention Transformers with Hardware-Efficient Training
Dec. 12, 2023, 7:37 a.m. | /u/Emergency_Shoulder27
Machine Learning www.reddit.com
**Code:** [**https://github.com/sustcsonglin/gated\_linear\_attention\_layer**](https://github.com/sustcsonglin/gated_linear_attention_layer)
**Abstract**:
>Transformers with linear attention allow for efficient parallel training but can simultaneously be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear (with respect to output length) inference complexity. Recent works such as RetNet (Sun et al., 2023) and TransNormerLLM (Qin et al., 2023a) observe that adding a global decay term to the additive RNN update rule greatly improves performance, sometimes outperforming standard Transformers with softmax attention when trained at scale. In …
abstract attention complexity global hidden inference linear machinelearning matrix observe rnn training transformers update
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Data Scientist (Database Development)
@ Nasdaq | Bengaluru-Affluence