Feb. 28, 2024, 6:29 p.m. | /u/Successful-Western27

Machine Learning www.reddit.com

Researchers have struggled to make AI-generated talking head videos that capture the nuance of human facial expressions and speech. Methods usually fail to replicate the fluidity and synchronization of real human mouths and faces.

A new paper from Alibaba proposes **EMO**, an AI system that achieves unprecedented realism in synthesized talking head videos using a novel diffusion model approach.

EMO generates videos directly from audio clips and portrait images, without 3D graphics or animation:

* **Audio encoder** analyzes tone, rhythm …

alibaba diffusion diffusion model generated head human machinelearning nuance paper replicate researchers speech synchronization videos

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Data Scientist (Database Development)

@ Nasdaq | Bengaluru-Affluence