all AI news
Memories are One-to-Many Mapping Alleviators in Talking Face Generation
March 6, 2024, 5:46 a.m. | Anni Tang, Tianyu He, Xu Tan, Jun Ling, Li Song
cs.CV updates on arXiv.org arxiv.org
Abstract: Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature of one-to-many mapping from the input audio to the output video (e.g., one speech content may have multiple feasible visual appearances), learning a deterministic mapping like previous works brings ambiguity during training, and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression …
abstract arxiv audio cs.cv cs.mm face mapping memories multiple nature person photo portraits speech type video visual
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Business Intelligence Architect - Specialist
@ Eastman | Hyderabad, IN, 500 008