March 6, 2024, 5:46 a.m. | Anni Tang, Tianyu He, Xu Tan, Jun Ling, Li Song

cs.CV updates on arXiv.org arxiv.org

arXiv:2212.05005v3 Announce Type: replace
Abstract: Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature of one-to-many mapping from the input audio to the output video (e.g., one speech content may have multiple feasible visual appearances), learning a deterministic mapping like previous works brings ambiguity during training, and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression …

abstract arxiv audio cs.cv cs.mm face mapping memories multiple nature person photo portraits speech type video visual

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Business Intelligence Architect - Specialist

@ Eastman | Hyderabad, IN, 500 008