all AI news
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding. (arXiv:2111.02735v3 [cs.CL] UPDATED)
cs.CL updates on arXiv.org arxiv.org
Speech self-supervised models such as wav2vec 2.0 and HuBERT are making
revolutionary progress in Automatic Speech Recognition (ASR). However, they
have not been totally proven to produce better performance on tasks other than
ASR. In this work, we explored partial fine-tuning and entire fine-tuning on
wav2vec 2.0 and HuBERT pre-trained models for three non-ASR speech tasks:
Speech Emotion Recognition, Speaker Verification and Spoken Language
Understanding. With simple proposed downstream frameworks, the best scores
reached 79.58% weighted accuracy on speaker-dependent setting …
arxiv benchmark emotion language speech spoken language understanding understanding verification