all AI news
Confusion about BERT masking?
I am trying to understand the masking in BERT model.
I have confusion in following line taken from paper
The training data generator chooses 15% of the token positions at random for prediction. If the i-th token is chosen, we replace the i-th token with (1) the [MASK] token 80% of the time (2) a random token 10% of the time (3) the unchanged i-th token 10% of the time
at point 3 it say unchanged token (i think it …!-->