Feb. 29, 2024, 5:45 a.m. | Meidai Xuanyuan, Yuwang Wang, Honglei Guo, Qionghai Dai

cs.CV updates on arXiv.org arxiv.org

arXiv:2402.18092v1 Announce Type: new
Abstract: In this paper, we consider a novel and practical case for talking face video generation. Specifically, we focus on the scenarios involving multi-people interactions, where the talking context, such as audience or surroundings, is present. In these situations, the video generation should take the context into consideration in order to generate video content naturally aligned with driving audios and spatially coherent to the context. To achieve this, we provide a two-stage and cross-modal controllable video …

abstract arxiv audience case context cs.cv face focus interactions novel paper people practical type video video generation

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Analyst

@ Alstom | Johannesburg, GT, ZA