all AI news
UniCode: Learning a Unified Codebook for Multimodal Large Language Models
March 15, 2024, 4:45 a.m. | Sipeng Zheng, Bohan Zhou, Yicheng Feng, Ye Wang, Zongqing Lu
cs.CV updates on arXiv.org arxiv.org
Abstract: In this paper, we propose \textbf{UniCode}, a novel approach within the domain of multimodal large language models (MLLMs) that learns a unified codebook to efficiently tokenize visual, text, and potentially other types of signals. This innovation addresses a critical limitation in existing MLLMs: their reliance on a text-only codebook, which restricts MLLM's ability to generate images and texts in a multimodal context. Towards this end, we propose a language-driven iterative training paradigm, coupled with an …
abstract arxiv cs.ai cs.cl cs.cv domain innovation language language models large language large language models mllms multimodal novel paper reliance text type types unicode visual
More from arxiv.org / cs.CV updates on arXiv.org
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
2 days, 13 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
RL Analytics - Content, Data Science Manager
@ Meta | Burlingame, CA
Research Engineer
@ BASF | Houston, TX, US, 77079