April 25, 2024, 5:45 p.m. | Ido Amos, Jonathan Berant, Ankit Gupta

cs.CL updates on arXiv.org arxiv.org

arXiv:2310.02980v3 Announce Type: replace-cross
Abstract: Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on long sequences. However, these impressive empirical gains have been by and large demonstrated on benchmarks (e.g. Long Range Arena), where models are randomly initialized and trained to predict a target label from an input sequence. In this work, we show that random initialization leads to gross overestimation of …

abstract architectures arxiv benchmarks comparison cs.cl cs.lg data data-driven dependencies fair however machine machine learning modeling scratch space state state space models train transformers type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne