http://arxiv.org/abs/2201.11838

Jan. 31, 2022, 2:10 a.m. | Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad, Hanyin Wang, Yuan Luo

cs.CL updates on arXiv.org

Transformers-based models, such as BERT, have dramatically improved the
performance for various natural language processing tasks. The clinical
knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art
results when performed on clinical named entity recognition and natural
language inference tasks. One of the core limitations of these transformers is
the substantial memory consumption due to their full self-attention mechanism.
To overcome this, long sequence transformer models, e.g. Longformer and
BigBird, were proposed with the idea of sparse attention mechanism to reduce …

arxiv transformers

