[R] HGRN2: Gated Linear RNNs with State Expansion | allainews.com

May 3, 2024, 9:47 a.m. | /u/SeawaterFlows

Machine Learning www.reddit.com

**Paper**: [https://arxiv.org/abs/2404.07904](https://arxiv.org/abs/2404.07904)

**Code**: [https://github.com/OpenNLPLab/HGRN2](https://github.com/OpenNLPLab/HGRN2)

**Standalone code** (1): [https://github.com/Doraemonzzz/hgru2-pytorch](https://github.com/Doraemonzzz/hgru2-pytorch)

**Standalone code** (2): [https://github.com/sustcsonglin/flash-linear-attention/tree/main/fla/models/hgrn2](https://github.com/sustcsonglin/flash-linear-attention/tree/main/fla/models/hgrn2)

**Abstract**:

>**Hierarchically gated linear RNN** (**HGRN**, Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, which limits its expressiveness. To address this issue, inspired by linear attention, we introduce a simple outer-product-based state expansion mechanism so that the recurrent state size can be significantly enlarged without introducing any additional …

abstract attention expansion however inference issue language linear machinelearning modeling performance product rnn simple small speed state training while

More from www.reddit.com / Machine Learning

[D] Machine Learning Engineers, what portion of your work is focused on deployment pipelines vs. … 9 hours ago | www.reddit.com

building data data engineer deployment +10

[D] How are subspace embeddings different from basic dimensionality reduction? 11 hours ago | www.reddit.com

advanced basic dimensionality embeddings +6

[P] Real Time Emotion Classification with FER-2013 dataset 19 hours ago | www.reddit.com

accuracy classification dataset emotion +7

[D] Real chances to be accepted in NeurIPS 2024 - Other conferences 23 hours ago | www.reddit.com

authors case conferences exit +5

[D] Seminal papers list since 2018 that will be considered cannon in the future 1 day, 2 hours ago | www.reddit.com

attention attention is all you need clip finally +13

[D] Are PyTorch high-level frameworks worth using? 1 day, 3 hours ago | www.reddit.com

biases experiment frameworks ignite +10

[D] Friday's Oxen.AI Water Cooler call: High-performance audio processing, Python vs Rust 1 day, 11 hours ago | www.reddit.com

audio conference data discuss +17

[R] Energy-based Hopfield Boosting for Out-of-Distribution Detection 1 day, 11 hours ago | www.reddit.com

advanced boosting data decision +14

[D] LWhy are Linear RNNs so performant (in terms of accuracy, not compute)? Looking for … 1 day, 12 hours ago | www.reddit.com

accuracy architecture compute linear +5

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net