InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following | allainews.com

April 29, 2024, 4:45 a.m. | Shufan Li, Harkanwar Singh, Aditya Grover

cs.CV updates on arXiv.org arxiv.org

arXiv:2312.06738v3 Announce Type: replace
Abstract: The ability to provide fine-grained control for generating and editing visual imagery has profound implications for computer vision and its applications. Previous works have explored extending controllability in two directions: instruction tuning with text-based prompts and multi-modal conditioning. However, these works make one or more unnatural assumptions on the number and/or type of modality inputs used to express controllability. We propose InstructAny2Pix, a flexible multi-modal instruction-following system that enables users to edit an input image …

arxiv cs.cv editing multimodal type via visual

More from arxiv.org / cs.CV updates on arXiv.org

CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images 14 hours ago | arxiv.org

arxiv center cs.cv dataset +10

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering 14 hours ago | arxiv.org

abstract agent arxiv augment +16

SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars 14 hours ago | arxiv.org

abstract arxiv association cs.cv +18

On Partial Shape Correspondence and Functional Maps 14 hours ago | arxiv.org

abstract apply arxiv cs.cv +10

Hierarchical Side-Tuning for Vision Transformers 14 hours ago | arxiv.org

abstract arxiv challenge computational +18

DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models 14 hours ago | arxiv.org

animation arxiv cs.cv cs.gr +7

Local Padding in Patch-Based GANs for Seamless Infinite-Sized Texture Synthesis 14 hours ago | arxiv.org

arxiv cs.cv eess.iv gans +5

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition 14 hours ago | arxiv.org

abstract action recognition applications arxiv +21

Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation 14 hours ago | arxiv.org

arxiv counterfactual cs.cv eess.iv +7

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net