March 12, 2024, 4:51 a.m. | Michael GinnUniversity of Colorado, Lindia TjuatjaCarnegie Mellon University, Taiqi HeCarnegie Mellon University, Enora RiceUniversity of Colorado, Gr

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.06399v1 Announce Type: new
Abstract: A key aspect of language documentation is the creation of annotated text in a format such as interlinear glossed text (IGT), which captures fine-grained morphosyntactic analyses in a morpheme-by-morpheme format. Prior work has explored methods to automatically generate IGT in order to reduce the time cost of language analysis. However, many languages (particularly those requiring preservation) lack sufficient IGT data to train effective models, and crosslingual transfer has been proposed as a method to overcome …

abstract arxiv cost cs.cl documentation fine-grained format generate key language low multilingual pretraining prior reduce text type work

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Data Scientist (Database Development)

@ Nasdaq | Bengaluru-Affluence