March 27, 2024, 4:46 a.m. | Peiang Zhao, Han Li, Ruiyang Jin, S. Kevin Zhou

cs.CV updates on arXiv.org arxiv.org

arXiv:2311.12342v3 Announce Type: replace
Abstract: Recent text-to-image diffusion models have reached an unprecedented level in generating high-quality images. However, their exclusive reliance on textual prompts often falls short in precise control of image compositions. In this paper, we propose LoCo, a training-free approach for layout-to-image Synthesis that excels in producing high-quality images aligned with both textual prompts and layout instructions. Specifically, we introduce a Localized Attention Constraint (LAC), leveraging semantic affinity between pixels in self-attention maps to create precise representations …

abstract arxiv control cs.cv diffusion diffusion models exclusive free however image image diffusion images paper prompts quality reliance synthesis text text-to-image textual training type

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Head of Data Governance - Vice President

@ iCapital | New York City, United States

Analytics Engineer / Data Analyst (Intermediate/Senior)

@ Employment Hero | Ho Chi Minh City, Ho Chi Minh City, Vietnam - Remote

Senior Customer Data Strategy Manager (Remote, San Francisco)

@ Dynatrace | San Francisco, CA, United States

Software Developer - AI/Machine Learning

@ ICF | Nationwide Remote Office (US99)

Senior Data Science Manager - Logistics, Rider (all genders)

@ Delivery Hero | Berlin, Germany