June 17, 2022, 1:12 a.m. | Stefano Lusito, Edoardo Ferrante, Jean Maillard

cs.CL updates on arXiv.org arxiv.org

Text normalization is a crucial technology for low-resource languages which
lack rigid spelling conventions. Low-resource text normalization has so far
relied upon hand-crafted rules, which are perceived to be more data efficient
than neural methods.


In this paper we examine the case of text normalization for Ligurian, an
endangered Romance language. We collect 4,394 Ligurian sentences paired with
their normalized versions, as well as the first monolingual corpus for
Ligurian. We show that, in spite of the small amounts of …

arxiv case normalization text

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Enterprise Data Quality, Senior Analyst

@ Toyota North America | Plano

Data Analyst & Audit Management Software (AMS) Coordinator

@ World Vision | Philippines - Home Working

Product Manager Power BI Platform Tech I&E Operational Insights

@ ING | HBP (Amsterdam - Haarlerbergpark)

Sr. Director, Software Engineering, Clinical Data Strategy

@ Moderna | USA-Washington-Seattle-1099 Stewart Street

Data Engineer (Data as a Service)

@ Xplor | Atlanta, GA, United States