May 3, 2024, 9:47 a.m. | /u/SeawaterFlows

Machine Learning www.reddit.com

**Paper**: [https://arxiv.org/abs/2404.07904](https://arxiv.org/abs/2404.07904)

**Code**: [https://github.com/OpenNLPLab/HGRN2](https://github.com/OpenNLPLab/HGRN2)

**Standalone code** (1): [https://github.com/Doraemonzzz/hgru2-pytorch](https://github.com/Doraemonzzz/hgru2-pytorch)

**Standalone code** (2): [https://github.com/sustcsonglin/flash-linear-attention/tree/main/fla/models/hgrn2](https://github.com/sustcsonglin/flash-linear-attention/tree/main/fla/models/hgrn2)

**Abstract**:

>**Hierarchically gated linear RNN** (**HGRN**, Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, which limits its expressiveness. To address this issue, inspired by linear attention, we introduce a simple outer-product-based state expansion mechanism so that the recurrent state size can be significantly enlarged without introducing any additional …

abstract attention expansion however inference issue language linear machinelearning modeling performance product rnn simple small speed state training while

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US