Jan. 20, 2022, 3:48 p.m. | /u/KushnarevaL

Natural Language Processing www.reddit.com

Hello, people. I still have some questions after reading the paper about Big Bird model ( https://arxiv.org/pdf/2007.14062v2.pdf ) and will be happy if some Big Bird specialists will help me to understand this model better.

  1. Is distribution of random attention (Figure 1 (a)) fixed from advance for all inputs, or it somehow can be different for different inputs even on the same head?
  2. In BIGBIRD-ETC, do they add some additional global tokens, aside of [CLS]?
  3. In BIGBIRD-ITC, how is the …

languagetechnology

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Analyst - Associate

@ JPMorgan Chase & Co. | Mumbai, Maharashtra, India

Staff Data Engineer (Data Platform)

@ Coupang | Seoul, South Korea

AI/ML Engineering Research Internship

@ Keysight Technologies | Santa Rosa, CA, United States

Sr. Director, Head of Data Management and Reporting Execution

@ Biogen | Cambridge, MA, United States

Manager, Marketing - Audience Intelligence (Senior Data Analyst)

@ Delivery Hero | Singapore, Singapore