all AI news
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
April 16, 2024, 4:48 a.m. | Nithin Gopalakrishnan Nair, Jeya Maria Jose Valanarasu, Vishal M Patel
cs.CV updates on arXiv.org arxiv.org
Abstract: Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation as well as spatially conditioned image generation. For most applications, we can train the model end-toend with paired data to obtain photorealistic generation quality. However, to add an additional task, one often needs to retrain the model from scratch using paired data across all modalities to retain good generation performance. In this paper, we tackle this issue and propose a novel strategy …
abstract applications arxiv cs.cv data diffusion diffusion models generative however image image diffusion image generation modal multi-modal photorealistic quality text text-to-image train type
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Engineer
@ Quantexa | Sydney, New South Wales, Australia
Staff Analytics Engineer
@ Warner Bros. Discovery | NY New York 230 Park Avenue South