March 25, 2024, 4:46 a.m. | Enora Rice, Ali Marashian, Luke Gessler, Alexis Palmer, Katharina von der Wense

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.14840v1 Announce Type: new
Abstract: Canonical morphological segmentation is the process of analyzing words into the standard (aka underlying) forms of their constituent morphemes. This is a core task in language documentation, and NLP systems have the potential to dramatically speed up this process. But in typical language documentation settings, training data for canonical morpheme segmentation is scarce, making it difficult to train high quality models. However, translation data is often much more abundant, and, in this work, we present …

abstract arxiv canonical core cs.cl data documentation forms language nlp nlp systems process segmentation speed standard systems training training data translation type words

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US