all AI news
Capturing Spectral and Long-term Contextual Information for Speech Emotion Recognition Using Deep Learning Techniques. (arXiv:2308.04517v1 [cs.SD])
cs.CL updates on arXiv.org arxiv.org
Traditional approaches in speech emotion recognition, such as LSTM, CNN, RNN,
SVM, and MLP, have limitations such as difficulty capturing long-term
dependencies in sequential data, capturing the temporal dynamics, and
struggling to capture complex patterns and relationships in multimodal data.
This research addresses these shortcomings by proposing an ensemble model that
combines Graph Convolutional Networks (GCN) for processing textual data and the
HuBERT transformer for analyzing audio signals. We found that GCNs excel at
capturing Long-term contextual dependencies and relationships …
arxiv cnn data deep learning deep learning techniques dependencies dynamics emotion information limitations long-term lstm mlp multimodal multimodal data patterns recognition relationships research rnn speech speech emotion svm temporal