all AI news
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. (arXiv:2110.13900v5 [cs.CL] UPDATED)
cs.CL updates on arXiv.org arxiv.org
Self-supervised learning (SSL) achieves great success in speech recognition,
while limited exploration has been attempted for other speech processing tasks.
As speech signal contains multi-faceted information including speaker identity,
paralinguistics, spoken content, etc., learning universal representations for
all speech tasks is challenging. To tackle the problem, we propose a new
pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM
jointly learns masked speech prediction and denoising in pre-training. By this
means, WavLM does not only keep the speech content …
arxiv pre-training processing scale speech speech processing stack training