all AI news
GlossLM: Multilingual Pretraining for Low-Resource Interlinear Glossing
March 12, 2024, 4:51 a.m. | Michael GinnUniversity of Colorado, Lindia TjuatjaCarnegie Mellon University, Taiqi HeCarnegie Mellon University, Enora RiceUniversity of Colorado, Gr
cs.CL updates on arXiv.org arxiv.org
Abstract: A key aspect of language documentation is the creation of annotated text in a format such as interlinear glossed text (IGT), which captures fine-grained morphosyntactic analyses in a morpheme-by-morpheme format. Prior work has explored methods to automatically generate IGT in order to reduce the time cost of language analysis. However, many languages (particularly those requiring preservation) lack sufficient IGT data to train effective models, and crosslingual transfer has been proposed as a method to overcome …
abstract arxiv cost cs.cl documentation fine-grained format generate key language low multilingual pretraining prior reduce text type work
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Data Scientist (Database Development)
@ Nasdaq | Bengaluru-Affluence