all AI news
SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models
May 3, 2024, 4:58 a.m. | Burak Can Biner, Farrin Marouf Sofian, Umur Berkay Karaka\c{s}, Duygu Ceylan, Erkut Erdem, Aykut Erdem
cs.CV updates on arXiv.org arxiv.org
Abstract: We are witnessing a revolution in conditional image synthesis with the recent success of large scale text-to-image generation methods. This success also opens up new opportunities in controlling the generation and editing process using multi-modal input. While spatial control using cues such as depth, sketch, and other images has attracted a lot of research, we argue that another equally effective modality is audio since sound and sight are two main components of human perception. Hence, …
abstract arxiv audio control cs.cv diffusion diffusion models editing image image generation modal multi-modal opportunities process scale spatial success synthesis text text-to-image type while
More from arxiv.org / cs.CV updates on arXiv.org
Retrieval-Augmented Egocentric Video Captioning
1 day, 12 hours ago |
arxiv.org
Mirror-Aware Neural Humans
1 day, 12 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US