all AI news
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
April 3, 2024, 4:47 a.m. | Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos
cs.CL updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with a frozen LLM, bridging the gap between music audio and textual contexts. We train it on …
abstract applications arxiv captioning convergence cs.ai cs.cl cs.mm cs.sd domains eess.as gap language language models large language large language models llms multimodal music novel query responses text textual type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Data Architect
@ S&P Global | IN - HYDERABAD SKYVIEW
Data Architect I
@ S&P Global | US - VA - CHARLOTTESVILLE 212 7TH STREET