June 14, 2024, 1:44 a.m. | Maciej Pi\'oro, Maciej Wo{\l}czyk, Razvan Pascanu, Johannes von Oswald, Jo\~ao Sacramento

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.08423v1 Announce Type: new
Abstract: A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost of processing a new input is independent of sequence length. Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter interpolation. Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states …

abstract art arxiv context cost cs.ai cs.lg explore independent input linear modeling networks neural networks performance processing recurrent neural networks retrieval skill state type

Data Scientist

@ Ford Motor Company | Chennai, Tamil Nadu, India

Systems Software Engineer, Graphics

@ Parallelz | Vancouver, British Columbia, Canada - Remote

Engineering Manager - Geo Engineering Team (F/H/X)

@ AVIV Group | Paris, France

Data Analyst

@ Microsoft | San Antonio, Texas, United States

Azure Data Engineer

@ TechVedika | Hyderabad, India

Senior Data & AI Threat Detection Researcher (Cortex)

@ Palo Alto Networks | Tel Aviv-Yafo, Israel