Feb. 21, 2024, 5:43 a.m. | Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud S\'eguier

cs.LG updates on arXiv.org arxiv.org

arXiv:2305.03582v3 Announce Type: replace-cross
Abstract: In this paper, we present a multimodal and dynamical VAE (MDVAE) applied to unsupervised audio-visual speech representation learning. The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality. A static latent variable is also introduced to encode the information that is constant over time within an audiovisual speech sequence. The model is trained in an unsupervised manner on an audiovisual …

abstract arxiv audio autoencoder cs.lg cs.mm cs.sd eess.as multimodal paper representation representation learning space speech type unsupervised vae visual

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US