June 18, 2024, 4:49 a.m. | Jaewoo Lee, Boyang Li, Sung Ju Hwang

cs.LG updates on arXiv.org arxiv.org

arXiv:2406.10995v1 Announce Type: cross
Abstract: Instruction tuning, or supervised finetuning on extensive task-specific data, is necessary for Large Vision-Language Models (LVLMs) to generalize well across a broad range of vision-language (VL) tasks. However, training on large VL datasets can become prohibitively expensive. In this work, we introduce COINCIDE, an effective and scalable data selection technique that uses a small model as a reference model to select visual instruction tuning data for efficient finetuning of a target LVLM, focusing on diversity …

abstract arxiv become concept cs.cv cs.lg data datasets finetuning however instruction tuning language language models scalable skill tasks training tuning type vision vision-language vision-language models work

Data Scientist

@ Ford Motor Company | Chennai, Tamil Nadu, India

Systems Software Engineer, Graphics

@ Parallelz | Vancouver, British Columbia, Canada - Remote

Engineering Manager - Geo Engineering Team (F/H/X)

@ AVIV Group | Paris, France

Data Analyst

@ Microsoft | San Antonio, Texas, United States

Azure Data Engineer

@ TechVedika | Hyderabad, India

Senior Data & AI Threat Detection Researcher (Cortex)

@ Palo Alto Networks | Tel Aviv-Yafo, Israel