all AI news
Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation
Feb. 19, 2024, 5:48 a.m. | Matthias Lindemann, Alexander Koller, Ivan Titov
cs.CL updates on arXiv.org arxiv.org
Abstract: Strong inductive biases enable learning from little data and help generalization outside of the training distribution. Popular neural architectures such as Transformers lack strong structural inductive biases for seq2seq NLP tasks on their own. Consequently, they struggle with systematic generalization beyond the training distribution, e.g. with extrapolating to longer inputs, even when pre-trained on large amounts of text. We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training …
abstract architectures arxiv beyond bias biases cs.cl data distribution inductive neural architectures nlp popular seq2seq simulation struggle tasks training transformers type
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
1 day, 18 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
1 day, 18 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Director, Clinical Data Science
@ Aura | Remote USA
Research Scientist, AI (PhD)
@ Meta | Menlo Park, CA | New York City