all AI news
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs. (arXiv:2401.16160v2 [cs.CV] UPDATED)
cs.CV updates on arXiv.org arxiv.org
Instruction finetuning on a variety of image-text instruction data is the key
to obtaining a versatile Multimodal Large Language Model (MLLM), and different
configurations of the instruction data can lead to finetuned models with
different capabilities. However, we have discovered that data conflicts are
inevitable when mixing instruction data from distinct domains, which can result
in performance drops for tasks of a specific domain. To address this issue, we
propose to apply an efficient Mixture of Experts (MoE) design, which …
arxiv capabilities cs.cv data experts finetuning image key language language model large language large language model llava lora mllm mllms multimodal multimodal large language model text the key