all AI news
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks. (arXiv:2104.13332v3 [cs.LG] UPDATED)
Aug. 17, 2022, 1:12 a.m. | Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic
cs.CV updates on arXiv.org arxiv.org
Video-to-speech is the process of reconstructing the audio speech from a
video of a spoken utterance. Previous approaches to this task have relied on a
two-step process where an intermediate representation is inferred from the
video, and is then decoded into waveform audio using a vocoder or a waveform
reconstruction algorithm. In this work, we propose a new end-to-end
video-to-speech model based on Generative Adversarial Networks (GANs) which
translates spoken video to waveform end-to-end without using any intermediate
representation or …
arxiv generative adversarial networks lg networks speech video
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Lead Software Engineer - Artificial Intelligence, LLM
@ OpenText | Hyderabad, TG, IN
Lead Software Engineer- Python Data Engineer
@ JPMorgan Chase & Co. | GLASGOW, LANARKSHIRE, United Kingdom
Data Analyst (m/w/d)
@ Collaboration Betters The World | Berlin, Germany
Data Engineer, Quality Assurance
@ Informa Group Plc. | Boulder, CO, United States
Director, Data Science - Marketing
@ Dropbox | Remote - Canada