June 7, 2024, 4:51 a.m. | Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun, Xuechen Wang, Yong Qin

cs.CL updates on arXiv.org arxiv.org

arXiv:2406.03814v1 Announce Type: new
Abstract: The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise …

abstract application arxiv asr automatic speech recognition bilingual challenges chinese code cs.cl cs.sd eess.as english however improvement improving knn multilingual performance potential recognition speech speech recognition type zero-shot

