March 7, 2024, 5:45 a.m. | Tianyi Song, Jiuxin Cao, Kun Wang, Bo Liu, Xiaofeng Zhang

cs.CV updates on arXiv.org arxiv.org

arXiv:2309.09553v4 Announce Type: replace
Abstract: The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. …

abstract art arxiv attention capability captions cs.ai cs.cv cs.mm current diffusion diffusion models features however image progress state stories story synthesis text text-to-image type visual

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US