all AI news
gaHealth: An English-Irish Bilingual Corpus of Health Data
March 7, 2024, 5:47 a.m. | S\'eamus Lankford, Haithem Afli, \'Orla N\'i Loinsigh, Andy Way
cs.CL updates on arXiv.org arxiv.org
Abstract: Machine Translation is a mature technology for many high-resource language pairs. However in the context of low-resource languages, there is a paucity of parallel data datasets available for developing translation models. Furthermore, the development of datasets for low-resource languages often focuses on simply creating the largest possible dataset for generic translation. The benefits and development of smaller in-domain datasets can easily be overlooked. To assess the merits of using in-domain data, a dataset for the …
abstract arxiv bilingual context cs.ai cs.cl data datasets development english health health data however language languages low machine machine translation technology translation type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
DevOps Engineer (Data Team)
@ Reward Gateway | Sofia/Plovdiv