all AI news
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
April 25, 2024, 5:45 p.m. | Ido Amos, Jonathan Berant, Ankit Gupta
cs.CL updates on arXiv.org arxiv.org
Abstract: Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on long sequences. However, these impressive empirical gains have been by and large demonstrated on benchmarks (e.g. Long Range Arena), where models are randomly initialized and trained to predict a target label from an input sequence. In this work, we show that random initialization leads to gross overestimation of …
abstract architectures arxiv benchmarks comparison cs.cl cs.lg data data-driven dependencies fair however machine machine learning modeling scratch space state state space models train transformers type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US