Web: http://arxiv.org/abs/2201.12219

Jan. 31, 2022, 2:10 a.m. | Silvia Severini, Ayyoob Imani, Philipp Dufter, Hinrich Schütze

cs.CL updates on arXiv.org arxiv.org

Parallel corpora are ideal for extracting a multilingual named entity (MNE)
resource, i.e., a dataset of names translated into multiple languages. Prior
work on extracting MNE datasets from parallel corpora required resources such
as large monolingual corpora or word aligners that are unavailable or perform
poorly for underresourced languages. We present CLC-BN, a new method for
creating an MNE resource, and apply it to the Parallel Bible Corpus, a corpus
of more than 1000 languages. CLC-BN learns a neural transliteration …

arxiv data

More from arxiv.org / cs.CL updates on arXiv.org

Director, Data Engineering and Architecture

@ Chainalysis | California | New York | Washington DC | Remote - USA

Deep Learning Researcher

@ Topaz Labs | Dallas, TX

Sr Data Engineer (Contractor)

@ SADA | US - West

Senior Cloud Database Administrator

@ Findhelp | Remote

Senior Data Analyst

@ System1 | Remote

Speech Machine Learning Research Engineer

@ Samsung Research America | Mountain View, CA