Web: http://arxiv.org/abs/2110.06650

May 5, 2022, 1:12 a.m. | Andreas Triantafyllopoulos, Uwe Reichel, Shuo Liu, Stephan Huber, Florian Eyben, Björn W. Schuller

cs.LG updates on arXiv.org arxiv.org

In this contribution, we investigate the effectiveness of deep fusion of text
and audio features for categorical and dimensional speech emotion recognition
(SER). We propose a novel, multistage fusion method where the two information
streams are integrated in several layers of a deep neural network (DNN), and
contrast it with a single-stage one where the streams are merged in a single
point. Both methods depend on extracting summary linguistic embeddings from a
pre-trained BERT model, and conditioning one or more …

arxiv emotion speech

More from arxiv.org / cs.LG updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California