March 20, 2024, 4:43 a.m. | Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran

cs.LG updates on arXiv.org arxiv.org

arXiv:2311.10081v2 Announce Type: replace-cross
Abstract: We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models to enhance its alignment and interactions by addressing two key limitations in the state-of-the-art LVLMs. First, prior LVLMs generally rely only on the instruction finetuning stage to enhance alignment with human preferences. Without incorporating extra feedback, they are still prone to generate unhelpful, hallucinated, or harmful responses. Second, while the visual instruction tuning data is …

abstract alignment art arxiv cs.cl cs.cv cs.lg exploits feedback humans interactions key language language model language models large language large language models limitations natural natural language prior state type via vision vision language model vision-language models

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Data Engineering Manager

@ Microsoft | Redmond, Washington, United States

Machine Learning Engineer

@ Apple | San Diego, California, United States