Web: http://arxiv.org/abs/2201.11838

Jan. 31, 2022, 2:10 a.m. | Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad, Hanyin Wang, Yuan Luo

cs.CL updates on arXiv.org arxiv.org

Transformers-based models, such as BERT, have dramatically improved the
performance for various natural language processing tasks. The clinical
knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art
results when performed on clinical named entity recognition and natural
language inference tasks. One of the core limitations of these transformers is
the substantial memory consumption due to their full self-attention mechanism.
To overcome this, long sequence transformer models, e.g. Longformer and
BigBird, were proposed with the idea of sparse attention mechanism to reduce …

arxiv transformers

More from arxiv.org / cs.CL updates on arXiv.org

Data Scientist

@ Fluent, LLC | Boca Raton, Florida, United States

Big Data ETL Engineer

@ Binance.US | Vancouver

Data Scientist / Data Engineer

@ Kin + Carta | Chicago

Data Engineer

@ Craft | Warsaw, Masovian Voivodeship, Poland

Senior Manager, Data Analytics Audit

@ Affirm | Remote US

Data Scientist - Nationwide Opportunities, AWS Professional Services

@ Amazon.com | US, NC, Virtual Location - N Carolina