May 1, 2024, 4:47 a.m. | Zachary William Hopton (Language,Space Lab, University of Zurich), No\"emi Aepli (Department of Computational Linguistics, University of Zurich)

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.19315v1 Announce Type: new
Abstract: Effectively normalizing textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model's representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing four Occitan dialects. Intrinsic evaluations of the model's embeddings revealed that surface similarity between the dialects strengthened representations. When the model was …

abstract arxiv challenge cs.cl data evaluation languages low modeling multilingual series study systems textual type variation writing

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US