April 3, 2024, 4:47 a.m. | Jian Zhu, Changbing Yang, Farhan Samir, Jahurul Islam

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.08323v2 Announce Type: replace
Abstract: In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages. We curated the IPAPACK, a massively multilingual speech corpora with phonemic transcriptions, encompassing more than 115 languages from diverse language families, selectively checked by linguists. Based on the IPAPACK, we propose CLAP-IPA, a multi-lingual phoneme-speech contrastive embedding model capable of open-vocabulary matching between arbitrary speech signals and phonemic sequences. The proposed model was tested on 95 …

abstract alignment arxiv cs.cl cs.sd diverse eess.as families language languages massively multilingual multilingual processing project speech speech processing type

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote