all AI news
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
April 18, 2024, 4:44 a.m. | Yichi Zhang, Yinpeng Dong, Siyuan Zhang, Tianzan Min, Hang Su, Jun Zhu
cs.CV updates on arXiv.org arxiv.org
Abstract: Although Multimodal Large Language Models (MLLMs) have demonstrated promising versatile capabilities, their performance is still inferior to specialized models on downstream tasks, which makes adaptation necessary to enhance their utility. However, fine-tuning methods require independent training for every model, leading to huge computation and memory overheads. In this paper, we propose a novel setting where we aim to improve the performance of diverse MLLMs with a group of shared parameters optimized for a downstream task. …
abstract arxiv capabilities computation cs.ai cs.cv cs.lg every fine-tuning however independent language language models large language large language models mllms multimodal performance prompting tasks training type utility visual visual prompting
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US