all AI news
Alleviating the Inequality of Attention Heads for Neural Machine Translation. (arXiv:2009.09672v2 [cs.CL] UPDATED)
Sept. 1, 2022, 1:13 a.m. | Zewei Sun, Shujian Huang, Xin-Yu Dai, Jiajun Chen
cs.CL updates on arXiv.org arxiv.org
Recent studies show that the attention heads in Transformer are not equal. We
relate this phenomenon to the imbalance training of multi-head attention and
the model dependence on specific heads. To tackle this problem, we propose a
simple masking method: HeadMask, in two specific ways. Experiments show that
translation improvements are achieved on multiple language pairs. Subsequent
empirical analyses also support our assumption and confirm the effectiveness of
the method.
arxiv attention inequality machine machine translation neural machine translation translation
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Machine Learning Engineer (m/f/d)
@ StepStone Group | Düsseldorf, Germany
2024 GDIA AI/ML Scientist - Supplemental
@ Ford Motor Company | United States