March 29, 2024, 4:45 a.m. | Mingze Sun, Chao Xu, Xinyu Jiang, Yang Liu, Baigui Sun, Ruqi Huang

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.19467v1 Announce Type: new
Abstract: In this paper, we introduce an innovative task focused on human communication, aiming to generate 3D holistic human motions for both speakers and listeners. Central to our approach is the incorporation of factorization to decouple audio features and the combination of textual semantic information, thereby facilitating the creation of more realistic and coordinated movements. We separately train VQ-VAEs with respect to the holistic motions of both speaker and listener. We consider the real-time mutual influence …

abstract arxiv audio beyond combination communication cs.cv factorization features generate human information paper semantic speakers textual type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Data Scientist

@ Publicis Groupe | New York City, United States

Bigdata Cloud Developer - Spark - Assistant Manager

@ State Street | Hyderabad, India