all AI news
Text normalization for endangered languages: the case of Ligurian. (arXiv:2206.07861v1 [cs.CL])
June 17, 2022, 1:12 a.m. | Stefano Lusito, Edoardo Ferrante, Jean Maillard
cs.CL updates on arXiv.org arxiv.org
Text normalization is a crucial technology for low-resource languages which
lack rigid spelling conventions. Low-resource text normalization has so far
relied upon hand-crafted rules, which are perceived to be more data efficient
than neural methods.
In this paper we examine the case of text normalization for Ligurian, an
endangered Romance language. We collect 4,394 Ligurian sentences paired with
their normalized versions, as well as the first monolingual corpus for
Ligurian. We show that, in spite of the small amounts of …
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Data Scientist (m/f/x/d)
@ Symanto Research GmbH & Co. KG | Spain, Germany
Enterprise Data Quality, Senior Analyst
@ Toyota North America | Plano
Data Analyst & Audit Management Software (AMS) Coordinator
@ World Vision | Philippines - Home Working
Product Manager Power BI Platform Tech I&E Operational Insights
@ ING | HBP (Amsterdam - Haarlerbergpark)
Sr. Director, Software Engineering, Clinical Data Strategy
@ Moderna | USA-Washington-Seattle-1099 Stewart Street
Data Engineer (Data as a Service)
@ Xplor | Atlanta, GA, United States