June 7, 2024, 4:44 a.m. | Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Xiaoqiang Huang, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.04321v1 Announce Type: cross
Abstract: In this work, we systematically study music generation conditioned solely on the video. First, we present a large-scale dataset comprising 190K video-music pairs, including various genres such as movie trailers, advertisements, and documentaries. Furthermore, we propose VidMuse, a simple framework for generating music aligned with video inputs. VidMuse stands out by producing high-fidelity music that is both acoustically and semantically aligned with the video. By incorporating local and global visual cues, VidMuse enables the creation …

arxiv cs.cv cs.lg cs.mm cs.sd framework modeling music music generation simple type video

Senior Data Engineer

@ Displate | Warsaw

Principal Software Engineer

@ Microsoft | Prague, Prague, Czech Republic

Sr. Global Reg. Affairs Manager

@ BASF | Research Triangle Park, NC, US, 27709-3528

Senior Robot Software Developer

@ OTTO Motors by Rockwell Automation | Kitchener, Ontario, Canada

Coop - Technical Service Hub Intern

@ Teradyne | Santiago de Queretaro, MX

Coop - Technical - Service Inside Sales Intern

@ Teradyne | Santiago de Queretaro, MX