Web: http://arxiv.org/abs/2206.07981

June 20, 2022, 1:13 a.m. | Lianyang Ma, Yu Yao, Tao Liang, Tongliang Liu

cs.CV updates on arXiv.org arxiv.org

Multimodal sentiment analysis in videos is a key task in many real-world
applications, which usually requires integrating multimodal streams including
visual, verbal and acoustic behaviors. To improve the robustness of multimodal
fusion, some of the existing methods let different modalities communicate with
each other and modal the crossmodal interaction via transformers. However,
these methods only use the single-scale representations during the interaction
but forget to exploit multi-scale representations that contain different levels
of semantic information. As a result, the representations …

analysis arxiv cv multimodal scale sentiment analysis transformers videos

