May 16, 2024, 4:42 a.m. | Samir Sadok, Simon Leglaive, Renaud S\'eguier

cs.LG updates on arXiv.org arxiv.org

arXiv:2305.03568v2 Announce Type: replace-cross
Abstract: The limited availability of labeled data is a major challenge in audiovisual speech emotion recognition (SER). Self-supervised learning approaches have recently been proposed to mitigate the need for labeled data in various applications. This paper proposes the VQ-MAE-AV model, a vector quantized masked autoencoder (MAE) designed for audiovisual speech self-supervised representation learning and applied to SER. Unlike previous approaches, the proposed method employs a self-supervised paradigm based on discrete audio and visual speech representations learned …

abstract applications arxiv autoencoder availability challenge cs.lg cs.mm cs.sd data eess.as emotion major masked autoencoder paper recognition replace self-supervised learning speech speech emotion supervised learning type vector

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Business Intelligence Analyst Insights & Reporting

@ Bertelsmann | Hilversum, NH, NL, 1217WP