Aug. 10, 2023, 4:47 a.m. | Samiul Islam, Md. Maksudul Haque, Abu Jobayer Md. Sadat

cs.CL updates on arXiv.org arxiv.org

Traditional approaches in speech emotion recognition, such as LSTM, CNN, RNN,
SVM, and MLP, have limitations such as difficulty capturing long-term
dependencies in sequential data, capturing the temporal dynamics, and
struggling to capture complex patterns and relationships in multimodal data.
This research addresses these shortcomings by proposing an ensemble model that
combines Graph Convolutional Networks (GCN) for processing textual data and the
HuBERT transformer for analyzing audio signals. We found that GCNs excel at
capturing Long-term contextual dependencies and relationships …

arxiv cnn data deep learning deep learning techniques dependencies dynamics emotion information limitations long-term lstm mlp multimodal multimodal data patterns recognition relationships research rnn speech speech emotion svm temporal

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Data Science Analyst

@ Mayo Clinic | AZ, United States

Sr. Data Scientist (Network Engineering)

@ SpaceX | Redmond, WA