March 21, 2024, 4:46 a.m. | R. Gnana Praveen, Jahangir Alam

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.13659v1 Announce Type: new
Abstract: Multi-modal emotion recognition has recently gained a lot of attention since it can leverage diverse and complementary relationships over multiple modalities, such as audio, visual, and text. Most state-of-the-art methods for multimodal fusion rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complementary nature of the modalities. In this paper, we focus on dimensional emotion recognition based on the fusion of facial, vocal, and text modalities extracted from videos. Specifically, …

abstract art arxiv attention attention mechanisms audio cs.cv cs.sd diverse eess.as emotion fusion modal multi-modal multimodal multiple networks recognition recursive relationships state text type visual

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Data Engineering Manager

@ Microsoft | Redmond, Washington, United States

Machine Learning Engineer

@ Apple | San Diego, California, United States