all AI news
HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition. (arXiv:2304.06910v1 [eess.AS])
cs.CL updates on arXiv.org arxiv.org
Emotion recognition in conversations is challenging due to the multi-modal
nature of the emotion expression. We propose a hierarchical cross-attention
model (HCAM) approach to multi-modal emotion recognition using a combination of
recurrent and co-attention neural network models. The input to the model
consists of two modalities, i) audio data, processed through a learnable
wav2vec approach and, ii) text data represented using a bidirectional encoder
representations from transformers (BERT) model. The audio and text
representations are processed using a set of …
arxiv attention audio bert combination conversations data emotion encoder hierarchical nature network neural network recognition recurrent neural network self-attention set text transformers