Web: http://arxiv.org/abs/2201.11391

Jan. 28, 2022, 2:10 a.m. | Jivnesh Sandhan, Ayush Daksh, Om Adideva Paranjay, Laxmidhar Behera, Pawan Goyal

cs.CL updates on arXiv.org arxiv.org

Nowadays, code-mixing has become ubiquitous in Natural Language Processing
(NLP); however, no efforts have been made to address this phenomenon for Speech
Translation (ST) task. This can be solely attributed to the lack of code-mixed
ST task labelled data. Thus, we introduce Prabhupadavani, a multilingual
code-mixed ST dataset for 25 languages, covering ten language families,
containing 94 hours of speech by 130+ speakers, manually aligned with
corresponding text in the target language. Prabhupadvani is the first
code-mixed ST dataset available …

arxiv code data mixed speech translation

More from arxiv.org / cs.CL updates on arXiv.org

Data Analytics and Technical support Lead

@ Coupa Software, Inc. | Bogota, Colombia

Data Science Manager

@ Vectra | San Jose, CA

Data Analyst Sr

@ Capco | Brazil - Sao Paulo

Data Scientist (NLP)

@ Builder.ai | London, England, United Kingdom - Remote

Senior Data Analyst

@ BuildZoom | Scottsdale, AZ/ San Francisco, CA/ Remote

Senior Research Scientist, Speech Recognition

@ SoundHound Inc. | Toronto, Canada