May 27, 2022, 6:11 a.m. | /u/GrammarPaparazzi

Natural Language Processing www.reddit.com

Hey!

So I have been working closely with DataCollators, and I am unable to understand whether or not the DataCollatorForLanguageModelling class masks statically or dynamically. For reference, here is the [code](https://github.com/huggingface/transformers/blob/main/src/transformers/data/data_collator.py#:~:text=def%20torch_mask_tokens(self%2C%20inputs%3A%20Any%2C%20special_tokens_mask%3A%20Optional%5BAny%5D%20%3D%20None)%20%2D%3E%20Tuple%5BAny%2C%20Any%5D%3A) for the aforementioned class. As per my understanding so far, I find it to be a simple static masking, but this [issue](https://github.com/huggingface/transformers/issues/5979) posted on the repo says otherwise. If someone has any experience regarding this, please do help out, and if it is indeed static masking, how does one …

languagetechnology

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Machine Learning Engineer (m/f/d)

@ StepStone Group | Düsseldorf, Germany

2024 GDIA AI/ML Scientist - Supplemental

@ Ford Motor Company | United States