all AI news
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding. (arXiv:2111.02735v2 [cs.CL] UPDATED)
cs.CL updates on arXiv.org arxiv.org
Speech self-supervised models such as wav2vec 2.0 and HuBERT are making
revolutionary progress in Automatic Speech Recognition (ASR). However, they
have not been totally proved to produce better performance on tasks other than
ASR. In this work, we explore partial fine-tuning and entire fine-tuning on
wav2vec 2.0 and HuBERT pre-trained models for three non-ASR speech tasks :
Speech Emotion Recognition, Speaker Verification and Spoken Language
Understanding. With simple proposed down-stream frameworks, the best scores
reach 79.58% weighted accuracy for Speech …
arxiv benchmark emotion language speech spoken language understanding understanding verification