all AI news
VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning
June 21, 2024, 4:51 a.m. | Ziyang Meng, Yu Dai, Zezheng Gong, Shaoxiong Guo, Minglong Tang, Tongquan Wei
cs.CV updates on arXiv.org arxiv.org
Abstract: Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in hallucinations and incorrect responses in GUI comprehension.To address these issues, we introduce VGA, a fine-tuned model designed for comprehensive GUI understanding. …
abstract advances arxiv assistant challenge charts cs.cv fine-tuning format gui hallucinations image images information language language models performance tasks textual through tuning type vision vision-language vision-language models
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
AI Focused Biochemistry Postdoctoral Fellow
@ Lawrence Berkeley National Lab | Berkeley, CA
Senior Quality Specialist - JAVA
@ SAP | Bengaluru, IN, 560066
Aktuar Financial Lines (m/w/d)
@ Zurich Insurance | Köln, DE
Senior Network Engineer
@ ManTech | 054H - 124TchnlgyPrkWy,SBurlington,VT
Pricing Analyst
@ EDF | Exeter, GB
Specialist IS Engineer
@ Amgen | US - California - Thousand Oaks - Field/Remote