Web: https://www.reddit.com/r/LanguageTechnology/comments/ses9wt/confusion_about_bert_masking/

Jan. 28, 2022, 2:39 p.m. | /u/mrtac96

Natural Language Processing reddit.com

I am trying to understand the masking in BERT model.

I have confusion in following line taken from paper

The training data generator chooses 15% of the token positions at random for prediction. If the i-th token is chosen, we replace the i-th token with (1) the [MASK] token 80% of the time (2) a random token 10% of the time (3) the unchanged i-th token 10% of the time

at point 3 it say unchanged token (i think it …

about bert languagetechnology

Data Operations Analyst

@ Mintel | Chicago

Data Analyst

@ PEAK6 | Austin, Chicago, Dallas, New York, Portland, Seattle

Data Scientist, Commercial Systems

@ Canonical Ltd. | Home based - EMEA

Sr. ML Data Associate, Information Data Operations

@ Amazon.com | US, CA, Virtual Location - California

Data Analyst (Europe & Australia)

@ Marley Spoon | Lisbon, Lisbon, Portugal - Remote

Healthcare ETL Developer

@ HealthVerity | United States