all AI news
Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
March 12, 2024, 4:48 a.m. | Yang Zhang, Teoh Tze Tzun, Lim Wei Hern, Tiviatis Sim, Kenji Kawaguchi
cs.CV updates on arXiv.org arxiv.org
Abstract: Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended semantics of the associated text prompts. We examine cross-attention layers in diffusion models and observe a propensity for these layers to disproportionately focus on certain tokens during the generation process, thereby undermining semantic fidelity. To address the issue of dominant attention, we introduce attention …
arxiv attention cs.cv diffusion diffusion models fidelity image regulation semantic synthesis text text-to-image type
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
2 days, 2 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Data Engineering Manager
@ Microsoft | Redmond, Washington, United States
Machine Learning Engineer
@ Apple | San Diego, California, United States