June 13, 2022, 9:14 a.m. | /u/mani-rai

Natural Language Processing www.reddit.com

Hugging Face states that:

>It is based on Facebook’s RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.

While XLM-R paper states:

>We follow the XLM approach as closely as possible, only introducing changes that improve performance at scale.

The confusion is RoBERTa uses dynamic masking whereas XLM uses static one. Also, RoBERTa uses 512 tokens max for input while XLM uses 256. Also, I didn’t understood the following XLM …

languagetechnology roberta

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Scientist, Commercial Analytics

@ Checkout.com | London, United Kingdom

Data Engineer I

@ Love's Travel Stops | Oklahoma City, OK, US, 73120