Sept. 1, 2022, 1:13 a.m. | Zewei Sun, Shujian Huang, Xin-Yu Dai, Jiajun Chen

cs.CL updates on arXiv.org arxiv.org

Recent studies show that the attention heads in Transformer are not equal. We
relate this phenomenon to the imbalance training of multi-head attention and
the model dependence on specific heads. To tackle this problem, we propose a
simple masking method: HeadMask, in two specific ways. Experiments show that
translation improvements are achieved on multiple language pairs. Subsequent
empirical analyses also support our assumption and confirm the effectiveness of
the method.

arxiv attention inequality machine machine translation neural machine translation translation

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Machine Learning Engineer (m/f/d)

@ StepStone Group | Düsseldorf, Germany

2024 GDIA AI/ML Scientist - Supplemental

@ Ford Motor Company | United States