all AI news
Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss. (arXiv:2211.10883v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
Recently, there has been numerous breakthroughs in face hallucination tasks.
However, the task remains rather challenging in videos in comparison to the
images due to inherent consistency issues. The presence of extra temporal
dimension in video face hallucination makes it non-trivial to learn the facial
motion through out the sequence. In order to learn these fine spatio-temporal
motion details, we propose a novel cross-modal audio-visual Video Face
Hallucination Generative Adversarial Network (VFH-GAN). The architecture
exploits the semantic correlation of between …
arxiv audio face lip reading loss reading speech support video