Web: http://arxiv.org/abs/2201.11258

Jan. 28, 2022, 2:10 a.m. | Hwichan Kim, Sangwhan Moon, Naoaki Okazaki, Mamoru Komachi

cs.CL updates on arXiv.org arxiv.org

South and North Korea both use the Korean language. However, Korean NLP
research has focused on South Korean only, and existing NLP systems of the
Korean language, such as neural machine translation (NMT) models, cannot
properly handle North Korean inputs. Training a model using North Korean data
is the most straightforward approach to solving this problem, but there is
insufficient data to train NMT models. In this study, we create data for North
Korean NMT models using a comparable corpus. …

