all AI news
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
April 5, 2024, 4:45 a.m. | AJ Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova
cs.CV updates on arXiv.org arxiv.org
Abstract: One of the main challenges of multimodal learning is the need to combine heterogeneous modalities (e.g., video, audio, text). For example, video and audio are obtained at much higher rates than text and are roughly aligned in time. They are often not synchronized with text, which comes as a global context, e.g., a title, or a description. Furthermore, video and audio inputs are of much larger volumes, and grow as the video length increases, which …
abstract arxiv audio autoregressive model challenges cs.cv example mirasol3b multimodal multimodal learning text type video
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
MLOps Engineer - Hybrid Intelligence
@ Capgemini | Madrid, M, ES
Analista de Business Intelligence (Industry Insights)
@ NielsenIQ | Cotia, Brazil