MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation | allainews.com

April 9, 2024, 4:48 a.m. | Kunpeng Song, Yizhe Zhu, Bingchen Liu, Qing Yan, Ahmed Elgammal, Xiao Yang

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.05674v1 Announce Type: new
Abstract: In this paper, we present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities. As foundational text-to-image models rapidly evolve, the demand for robust image-to-image translation grows. Addressing this need, MoMA specializes in subject-driven personalized image generation. Utilizing an open-source, Multimodal Large Language Model (MLLM), we train MoMA to serve a dual role as both a feature extractor and a generator. This approach effectively synergizes reference image and text prompt information to …

abstract adapter arxiv capabilities cs.cv demand foundational free image image generation image-to-image image-to-image translation language large language llm multimodal paper personalized robust text text-to-image training translation type zero-shot

More from arxiv.org / cs.CV updates on arXiv.org

A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and … 23 hours ago | arxiv.org

abstract arxiv beyond cs.cv +16

Enhancing Super-Resolution Networks through Realistic Thick-Slice CT Simulation 23 hours ago | arxiv.org

abstract acquisition arxiv cs.ai +20

TransRUPNet for Improved Polyp Segmentation 23 hours ago | arxiv.org

arxiv cs.cv eess.iv segmentation +1

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides 23 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +19

Attention is All They Need: Exploring the Media Archaeology of the Computer Vision Research Paper 23 hours ago | arxiv.org

abstract archaeology arxiv attention +22

Refining Remote Photoplethysmography Architectures using CKA and Empirical Methods 23 hours ago | arxiv.org

abstract architecture architectures arxiv +8

Learning to Complement with Multiple Humans 23 hours ago | arxiv.org

abstract adoption arxiv assumptions +12

HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition 23 hours ago | arxiv.org

abstract advances arxiv challenges +12

Image-Based Virtual Try-On: A Survey 23 hours ago | arxiv.org

arxiv cs.cv image survey +3

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Data Analyst (Digital Business Analyst)

@ Activate Interactive Pte Ltd | Singapore, Central Singapore, Singapore

View on ai-jobs.net