May 4, 2022, 1:11 a.m. | Yichong Huang, Xiaocheng Feng, Xinwei Geng, Bing Qin

cs.CL updates on arXiv.org arxiv.org

Although all-in-one-model multilingual neural machine translation (MNMT) has
achieved remarkable progress in recent years, its selected best overall
checkpoint fails to achieve the best performance simultaneously in all language
pairs. It is because that the best checkpoints for each individual language
pair (i.e., language-specific best checkpoints) scatter in different epochs. In
this paper, we present a novel training strategy dubbed Language-Specific
Self-Distillation (LSSD) for bridging the gap between language-specific best
checkpoints and the overall best checkpoint. In detail, we regard …

arxiv distillation language machine machine translation neural machine translation translation

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Technology Consultant Master Data Management (w/m/d)

@ SAP | Walldorf, DE, 69190

Research Engineer, Computer Vision, Google Research

@ Google | Nairobi, Kenya