Feb. 19, 2024, 5:48 a.m. | Matthias Lindemann, Alexander Koller, Ivan Titov

cs.CL updates on arXiv.org arxiv.org

arXiv:2310.00796v2 Announce Type: replace
Abstract: Strong inductive biases enable learning from little data and help generalization outside of the training distribution. Popular neural architectures such as Transformers lack strong structural inductive biases for seq2seq NLP tasks on their own. Consequently, they struggle with systematic generalization beyond the training distribution, e.g. with extrapolating to longer inputs, even when pre-trained on large amounts of text. We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training …

abstract architectures arxiv beyond bias biases cs.cl data distribution inductive neural architectures nlp popular seq2seq simulation struggle tasks training transformers type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Director, Clinical Data Science

@ Aura | Remote USA

Research Scientist, AI (PhD)

@ Meta | Menlo Park, CA | New York City