March 26, 2024, 4:48 a.m. | Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, Tao Mei

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.17005v1 Announce Type: new
Abstract: Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when shaping diffusion models to animate static image (i.e., image-to-video generation). The difficulty originates from the aspect that the diffusion process of subsequent animated frames should not only preserve the faithful alignment with the given image but also pursue temporal coherence among adjacent frames. To alleviate this, we present TRIP, a new recipe of image-to-video diffusion …

arxiv cs.cv cs.mm diffusion diffusion models image image-to-video noise prior residual temporal trip type video video diffusion

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

AIML - Sr Machine Learning Engineer, Data and ML Innovation

@ Apple | Seattle, WA, United States

Senior Data Engineer

@ Palta | Palta Cyprus, Palta Warsaw, Palta remote