Nov. 5, 2023, 6:48 a.m. | Shijie Ma, Huayi Xu, Mengjian Li, Weidong Geng, Meng Wang, Yaxiong Wang

cs.CV updates on arXiv.org arxiv.org

Despite the remarkable progress in text-to-video generation, existing
diffusion-based models often exhibit instability in terms of noise during
inference. Specifically, when different noises are fed for the given text,
these models produce videos that differ significantly in terms of both frame
quality and temporal consistency. With this observation, we posit that there
exists an optimal noise matched to each textual input; however, the widely
adopted strategies of random noise sampling often fail to capture it. In this
paper, we argue …

arxiv diffusion fed inference noise observation posit progress quality temporal terms text text-to-video video video generation videos

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US