April 17, 2023, 8:22 p.m. | Soumya Dutta, Sriram Ganapathy

cs.CL updates on arXiv.org arxiv.org

Emotion recognition in conversations is challenging due to the multi-modal
nature of the emotion expression. We propose a hierarchical cross-attention
model (HCAM) approach to multi-modal emotion recognition using a combination of
recurrent and co-attention neural network models. The input to the model
consists of two modalities, i) audio data, processed through a learnable
wav2vec approach and, ii) text data represented using a bidirectional encoder
representations from transformers (BERT) model. The audio and text
representations are processed using a set of …

arxiv attention audio bert combination conversations data emotion encoder hierarchical nature network neural network recognition recurrent neural network self-attention set text transformers

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne