all AI news
Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control
April 23, 2024, 4:46 a.m. | Maria Mihaela Trusca, Wolf Nuyts, Jonathan Thomm, Robert Honig, Thomas Hofmann, Tinne Tuytelaars, Marie-Francine Moens
cs.CV updates on arXiv.org arxiv.org
Abstract: Current diffusion models create photorealistic images given a text prompt as input but struggle to correctly bind attributes mentioned in the text to the right objects in the image. This is evidenced by our novel image-graph alignment model called EPViT (Edge Prediction Vision Transformer) for the evaluation of image-text alignment. To alleviate the above problem, we propose focused cross-attention (FCA) that controls the visual attention maps by syntactic constraints found in the input sentence. Additionally, …
abstract alignment arxiv control create cs.cv current diffusion diffusion models edge evaluation graph image image generation images novel object objects photorealistic photorealistic images prediction prompt struggle text text-to-image transformer type vision
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Data Science Analyst
@ Mayo Clinic | AZ, United States
Sr. Data Scientist (Network Engineering)
@ SpaceX | Redmond, WA