Web: http://arxiv.org/abs/2109.15053

May 9, 2022, 1:11 a.m. | Nik Vaessen, David A. van Leeuwen

cs.LG updates on arXiv.org arxiv.org

This paper explores applying the wav2vec2 framework to speaker recognition
instead of speech recognition. We study the effectiveness of the pre-trained
weights on the speaker recognition task, and how to pool the wav2vec2 output
sequence into a fixed-length speaker embedding. To adapt the framework to
speaker recognition, we propose a single-utterance classification variant with
CE or AAM softmax loss, and an utterance-pair classification variant with BCE
loss. Our best performing variant, w2v2-aam, achieves a 1.88% EER on the
extended voxceleb1 …

arxiv fine-tuning

More from arxiv.org / cs.LG updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California