all AI news
Optimal Noise pursuit for Augmenting Text-to-Video Generation. (arXiv:2311.00949v1 [cs.CV])
cs.CV updates on arXiv.org arxiv.org
Despite the remarkable progress in text-to-video generation, existing
diffusion-based models often exhibit instability in terms of noise during
inference. Specifically, when different noises are fed for the given text,
these models produce videos that differ significantly in terms of both frame
quality and temporal consistency. With this observation, we posit that there
exists an optimal noise matched to each textual input; however, the widely
adopted strategies of random noise sampling often fail to capture it. In this
paper, we argue …
arxiv diffusion fed inference noise observation posit progress quality temporal terms text text-to-video video video generation videos