all AI news
Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation
May 8, 2024, 4:46 a.m. | Dogucan Yaman, Fevziye Irem Eyiokur, Leonard B\"armann, Seymanur Akt{\i}, Haz{\i}m Kemal Ekenel, Alexander Waibel
cs.CV updates on arXiv.org arxiv.org
Abstract: In the task of talking face generation, the objective is to generate a face video with lips synchronized to the corresponding audio while preserving visual details and identity information. Current methods face the challenge of learning accurate lip synchronization while avoiding detrimental effects on visual quality, as well as robustly evaluating such synchronization. To tackle these problems, we propose utilizing an audio-visual speech representation expert (AV-HuBERT) for calculating lip synchronization loss during training. Moreover, leveraging …
abstract arxiv audio challenge cs.cv current effects evaluation expert face generate identity information representation speech synchronization type video video generation visual while
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US