March 28, 2024, 4:41 a.m. | Leonardo Pepino, Pablo Riera, Luciana Ferrer, Agustin Gravano

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.18635v1 Announce Type: new
Abstract: In this paper, we study different approaches for classifying emotions from speech using acoustic and text-based features. We propose to obtain contextualized word embeddings with BERT to represent the information contained in speech transcriptions and show that this results in better performance than using Glove embeddings. We also propose and compare different strategies to combine the audio and text modalities, evaluating them on IEMOCAP and MSP-PODCAST datasets. We find that fusing acoustic and text-based systems …

abstract arxiv bert cs.lg cs.sd eess.as embeddings emotion emotions features fusion information paper performance recognition results show speech study text the information type word word embeddings

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Risk Management - Machine Learning and Model Delivery Services, Product Associate - Senior Associate-

@ JPMorgan Chase & Co. | Wilmington, DE, United States

Senior ML Engineer (Speech/ASR)

@ ObserveAI | Bengaluru