all AI news
Data Augmentation to Address Out-of-Vocabulary Problem in Low-Resource Sinhala-English Neural Machine Translation. (arXiv:2205.08722v1 [cs.CL])
May 19, 2022, 1:11 a.m. | Aloka Fernando, Surangika Ranathunga
cs.CL updates on arXiv.org arxiv.org
Out-of-Vocabulary (OOV) is a problem for Neural Machine Translation (NMT).
OOV refers to words with a low occurrence in the training data, or to those
that are absent from the training data. To alleviate this, word or phrase-based
Data Augmentation (DA) techniques have been used. However, existing DA
techniques have addressed only one of these OOV types and limit to considering
either syntactic constraints or semantic constraints. We present a word and
phrase replacement-based DA technique that consider both types …
arxiv augmentation data machine machine translation neural machine translation translation
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Data Analyst (Commercial Excellence)
@ Allegro | Poznan, Warsaw, Poland
Senior Machine Learning Engineer
@ Motive | Pakistan - Remote
Summernaut Customer Facing Data Engineer
@ Celonis | Raleigh, US, North Carolina
Data Engineer Mumbai
@ Nielsen | Mumbai, India