March 7, 2024, 5:47 a.m. | S\'eamus Lankford, Haithem Afli, \'Orla N\'i Loinsigh, Andy Way

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.03575v1 Announce Type: new
Abstract: Machine Translation is a mature technology for many high-resource language pairs. However in the context of low-resource languages, there is a paucity of parallel data datasets available for developing translation models. Furthermore, the development of datasets for low-resource languages often focuses on simply creating the largest possible dataset for generic translation. The benefits and development of smaller in-domain datasets can easily be overlooked. To assess the merits of using in-domain data, a dataset for the …

abstract arxiv bilingual context cs.ai cs.cl data datasets development english health health data however language languages low machine machine translation technology translation type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

DevOps Engineer (Data Team)

@ Reward Gateway | Sofia/Plovdiv