May 25, 2022, 1:11 a.m. | Tao Lei, Ran Tian, Jasmijn Bastings, Ankur P. Parikh

cs.CL updates on arXiv.org arxiv.org

In this work, we explore whether modeling recurrence into the Transformer
architecture can both be beneficial and efficient, by building an extremely
simple recurrent module into the Transformer. We compare our model to baselines
following the training and evaluation recipe of BERT. Our results confirm that
recurrence can indeed improve Transformer models by a consistent margin,
without requiring low-level performance optimizations, and while keeping the
number of parameters constant. For example, our base model achieves an absolute
improvement of 2.1 …

arxiv language language models

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Intelligence Analyst

@ Rappi | COL-Bogotá

Applied Scientist II

@ Microsoft | Redmond, Washington, United States