May 1, 2024, 4:46 a.m. | Shivam Mehta, Anna Deichler, Jim O'Regan, Birger Mo\"ell, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.19622v1 Announce Type: cross
Abstract: Although humans engaged in face-to-face conversation simultaneously communicate both verbally and non-verbally, methods for joint and unified synthesis of speech audio and co-speech 3D gesture motion from text are a new and emerging field. These technologies hold great promise for more human-like, efficient, expressive, and robust synthetic communication, but are currently held back by the lack of suitably large datasets, as existing methods are trained on parallel data from all constituent modalities. Inspired by student-teacher …

arxiv cs.cv cs.gr cs.hc cs.sd data eess.as fake multimodal shortage speech synthesis synthetic synthetic data type

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Sr Business Intelligence Analyst

@ T. Rowe Price | Baltimore, MD

Business Intelligence Analyst, Market Insights and Analytics

@ Morningstar | Mumbai

Senior Back-End Developer - Generative AI

@ Aptiv | POL Krakow - Eng

System Architect (Document AI)

@ Trafigura | London - Traf Office