all AI news
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
April 25, 2024, 5:45 p.m. | Ido Amos, Jonathan Berant, Ankit Gupta
cs.CL updates on arXiv.org arxiv.org
Abstract: Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on long sequences. However, these impressive empirical gains have been by and large demonstrated on benchmarks (e.g. Long Range Arena), where models are randomly initialized and trained to predict a target label from an input sequence. In this work, we show that random initialization leads to gross overestimation of …
abstract architectures arxiv benchmarks comparison cs.cl cs.lg data data-driven dependencies fair however machine machine learning modeling scratch space state state space models train transformers type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne