all AI news
Embedded Translations for Low-resource Automated Glossing
March 14, 2024, 4:48 a.m. | Changbing Yang, Garrett Nicolai, Miikka Silfverberg
cs.CL updates on arXiv.org arxiv.org
Abstract: We investigate automatic interlinear glossing in low-resource settings. We augment a hard-attentional neural model with embedded translation information extracted from interlinear glossed text. After encoding these translations using large language models, specifically BERT and T5, we introduce a character-level decoder for generating glossed output. Aided by these enhancements, our model demonstrates an average improvement of 3.97\%-points over the previous state of the art on datasets from the SIGMORPHON 2023 Shared Task on Interlinear Glossing. In …
abstract arxiv automated bert cs.cl decoder embedded encoding information language language models large language large language models low text translation type
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
2 days, 12 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
2 days, 12 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Research Scientist, Demography and Survey Science, University Grad
@ Meta | Menlo Park, CA | New York City
Computer Vision Engineer, XR
@ Meta | Burlingame, CA