Feb. 12, 2024, 5:43 a.m. | Josh Gardner Simon Durand Daniel Stoller Rachel M. Bittner

cs.LG updates on arXiv.org arxiv.org

Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLark, an instruction-tuned multimodal model for \emph{music} understanding. We detail our process for dataset creation, which involves augmenting the annotations of diverse open-source music datasets and converting them to a unified instruction-tuning format. We propose a multimodal architecture for LLark, integrating a pretrained generative model for music with …

cs.lg cs.sd eess.as

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Business Intelligence Analyst Lead

@ Zillow | Mexico City

Lead Data Engineer

@ Bristol Myers Squibb | Hyderabad

Big Data Solutions Architect

@ Databricks | Munich, Germany

Senior Data Scientist - Trendyol Seller

@ Trendyol | Istanbul (All)