all AI news
Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia. (arXiv:2311.00998v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
Neural machine translation (NMT) for low-resource local languages in
Indonesia faces significant challenges, including the need for a representative
benchmark and limited data availability. This work addresses these challenges
by comprehensively analyzing training NMT systems for four low-resource local
languages in Indonesia: Javanese, Sundanese, Minangkabau, and Balinese. Our
study encompasses various training approaches, paradigms, data sizes, and a
preliminary study into using large language models for synthetic low-resource
languages parallel data generation. We reveal specific trends and insights into
practical …
arxiv availability benchmark benchmarking challenges data indonesia languages low machine machine translation neural machine translation systems training translation work