all AI news
TAMS: Translation-Assisted Morphological Segmentation
March 25, 2024, 4:46 a.m. | Enora Rice, Ali Marashian, Luke Gessler, Alexis Palmer, Katharina von der Wense
cs.CL updates on arXiv.org arxiv.org
Abstract: Canonical morphological segmentation is the process of analyzing words into the standard (aka underlying) forms of their constituent morphemes. This is a core task in language documentation, and NLP systems have the potential to dramatically speed up this process. But in typical language documentation settings, training data for canonical morpheme segmentation is scarce, making it difficult to train high quality models. However, translation data is often much more abundant, and, in this work, we present …
abstract arxiv canonical core cs.cl data documentation forms language nlp nlp systems process segmentation speed standard systems training training data translation type words
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne