all AI news
Text-to-Audio Generation Synchronized with Videos
March 14, 2024, 4:42 a.m. | Shentong Mo, Jing Shi, Yapeng Tian
cs.LG updates on arXiv.org arxiv.org
Abstract: In recent times, the focus on text-to-audio (TTA) generation has intensified, as researchers strive to synthesize audio from textual descriptions. However, most existing methods, though leveraging latent diffusion models to learn the correlation between audio and text embeddings, fall short when it comes to maintaining a seamless synchronization between the produced audio and its video. This often results in discernible audio-visual mismatches. To bridge this gap, we introduce a groundbreaking benchmark for Text-to-Audio generation that …
abstract arxiv audio audio generation correlation cs.ai cs.cv cs.lg cs.mm cs.sd diffusion diffusion models eess.as embeddings focus however latent diffusion models learn researchers synchronization text textual type videos
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US