March 26, 2024, 4:44 a.m. | Kartik, Sanjana Soni, Anoop Kunchukuttan, Tanmoy Chakraborty, Md Shad Akhtar

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.16771v1 Announce Type: cross
Abstract: The widespread online communication in a modern multilingual world has provided opportunities to blend more than one language (aka code-mixed language) in a single utterance. This has resulted a formidable challenge for the computational models due to the scarcity of annotated data and presence of noise. A potential solution to mitigate the data scarcity problem in low-resource setup is to leverage existing data in resource-rich language through translation. In this paper, we tackle the problem …

abstract annotated data arxiv blend challenge code communication computational cs.cl cs.lg data language mixed modern multilingual opportunities robust synthetic synthetic data translation type world

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Data Scientist

@ ITE Management | New York City, United States