March 14, 2024, 4:46 a.m. | Christos Papadimitriou, Giorgos Filandrianos, Maria Lymperaiou, Giorgos Stamou

cs.CV updates on

arXiv:2403.08502v1 Announce Type: new
Abstract: Story Visualization (SV) is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences. Previous approaches either employ some kind of memory mechanism to maintain context throughout an auto-regressive generation of the image sequence, or model the generation of the characters and their background separately, to improve the rendering of characters. On the contrary, we embrace a completely parallel transformer-based approach, exclusively relying on Cross-Attention with …

