all AI news
Joint Multimodal Transformer for Dimensional Emotional Recognition in the Wild
March 18, 2024, 4:42 a.m. | Paul Waligora, Osama Zeeshan, Haseeb Aslam, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger
cs.LG updates on arXiv.org arxiv.org
Abstract: Audiovisual emotion recognition (ER) in videos has immense potential over unimodal performance. It effectively leverages the inter- and intra-modal dependencies between visual and auditory modalities. This work proposes a novel audio-visual emotion recognition system utilizing a joint multimodal transformer architecture with key-based cross-attention. This framework aims to exploit the complementary nature of audio and visual cues (facial expressions and vocal patterns) in videos, leading to superior performance compared to solely relying on a single modality. …
abstract architecture arxiv attention audio cs.cv cs.lg cs.sd dependencies eess.as emotion framework key modal multimodal novel performance recognition transformer transformer architecture type videos visual work
More from arxiv.org / cs.LG updates on arXiv.org
Testing the Segment Anything Model on radiology data
1 day, 19 hours ago |
arxiv.org
Calorimeter shower superresolution
1 day, 19 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US