March 8, 2024, 5:47 a.m. | Priya Rani, Gaurav Negi, Theodorus Fransen, John P. McCrae

cs.CL updates on arXiv.org arxiv.org

arXiv:2403.04639v1 Announce Type: new
Abstract: The present paper introduces new sentiment data, MaCMS, for Magahi-Hindi-English (MHE) code-mixed language, where Magahi is a less-resourced minority language. This dataset is the first Magahi-Hindi-English code-mixed dataset for sentiment analysis tasks. Further, we also provide a linguistics analysis of the dataset to understand the structure of code-mixing and a statistical study to understand the language preferences of speakers with different polarities. With these analyses, we also train baseline models to evaluate the dataset's quality.

abstract analysis arxiv code cs.cl data dataset english hindi language linguistics mixed paper sentiment sentiment analysis tasks type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Business Intelligence Architect - Specialist

@ Eastman | Hyderabad, IN, 500 008