Web: http://arxiv.org/abs/2201.11391

Jan. 28, 2022, 2:10 a.m. | Jivnesh Sandhan, Ayush Daksh, Om Adideva Paranjay, Laxmidhar Behera, Pawan Goyal

cs.CL updates on arXiv.org arxiv.org

Nowadays, code-mixing has become ubiquitous in Natural Language Processing
(NLP); however, no efforts have been made to address this phenomenon for Speech
Translation (ST) task. This can be solely attributed to the lack of code-mixed
ST task labelled data. Thus, we introduce Prabhupadavani, a multilingual
code-mixed ST dataset for 25 languages, covering ten language families,
containing 94 hours of speech by 130+ speakers, manually aligned with
corresponding text in the target language. Prabhupadvani is the first
code-mixed ST dataset available …

arxiv code data mixed speech translation

