Web: http://arxiv.org/abs/2201.12219

Jan. 31, 2022, 2:10 a.m. | Silvia Severini, Ayyoob Imani, Philipp Dufter, Hinrich Schütze

cs.CL updates on arXiv.org arxiv.org

Parallel corpora are ideal for extracting a multilingual named entity (MNE)
resource, i.e., a dataset of names translated into multiple languages. Prior
work on extracting MNE datasets from parallel corpora required resources such
as large monolingual corpora or word aligners that are unavailable or perform
poorly for underresourced languages. We present CLC-BN, a new method for
creating an MNE resource, and apply it to the Parallel Bible Corpus, a corpus
of more than 1000 languages. CLC-BN learns a neural transliteration …

