all AI news
How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?. (arXiv:2208.05225v1 [cs.CL])
Aug. 11, 2022, 1:11 a.m. | Ali Araabi, Christof Monz, Vlad Niculae
cs.CL updates on arXiv.org arxiv.org
Neural Machine Translation (NMT) is an open vocabulary problem. As a result,
dealing with the words not occurring during training (a.k.a. out-of-vocabulary
(OOV) words) have long been a fundamental challenge for NMT systems. The
predominant method to tackle this problem is Byte Pair Encoding (BPE) which
splits words, including OOV words, into sub-word segments. BPE has achieved
impressive results for a wide range of translation tasks in terms of automatic
evaluation metrics. While it is often assumed that by using …
arxiv encoding machine machine translation neural machine translation translation words
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Business Intelligence Analyst
@ Rappi | COL-Bogotá
Applied Scientist II
@ Microsoft | Redmond, Washington, United States