Aug. 12, 2022, 1:11 a.m. | Muhammad ElNokrashy (1), Amr Hendy (1), Mohamed Maher (1), Mohamed Afify (1), Hany Hassan Awadalla (2) ((1) Microsoft ATL Cairo, (2) Microsoft Redmond

cs.CL updates on arXiv.org arxiv.org

This paper proposes a simple yet effective method to improve direct (X-to-Y)
translation for both cases: zero-shot and when direct data is available. We
modify the input tokens at both the encoder and decoder to include signals for
the source and target languages. We show a performance gain when training from
scratch, or finetuning a pretrained model with the proposed setup. In the
experiments, our method shows nearly 10.0 BLEU points gain on in-house datasets
depending on the checkpoint selection …

arxiv language performance tokens translation

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Business Intelligence Developer / Analyst

@ Transamerica | Work From Home, USA

Data Analyst (All Levels)

@ Noblis | Bethesda, MD, United States