VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning | allainews.com

June 21, 2024, 4:51 a.m. | Ziyang Meng, Yu Dai, Zezheng Gong, Shaoxiong Guo, Minglong Tang, Tongquan Wei

cs.CV updates on arXiv.org arxiv.org

arXiv:2406.14056v1 Announce Type: new
Abstract: Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in hallucinations and incorrect responses in GUI comprehension.To address these issues, we introduce VGA, a fine-tuned model designed for comprehensive GUI understanding. …

abstract advances arxiv assistant challenge charts cs.cv fine-tuning format gui hallucinations image images information language language models performance tasks textual through tuning type vision vision-language vision-language models

More from arxiv.org / cs.CV updates on arXiv.org

PlaNet-S: Automatic Semantic Segmentation of Placenta 15 hours ago | arxiv.org

abstract architectures arxiv automated +15

FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model 15 hours ago | arxiv.org

abstract arxiv cs.cv current +20

Continuous 3D Myocardial Motion Tracking via Echocardiography 15 hours ago | arxiv.org

abstract arxiv clinical continuous +17

Optimal Transport Aggregation for Visual Place Recognition 15 hours ago | arxiv.org

aggregation arxiv cs.cv recognition +4

BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning 15 hours ago | arxiv.org

abstract adapter agents arxiv +22

AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation 15 hours ago | arxiv.org

abstract applications arxiv automated +23

LiverUSRecon: Automatic 3D Reconstruction and Volumetry of the Liver with a Few Partial Ultrasound Scans 15 hours ago | arxiv.org

3d reconstruction abstract acquisition analysis +10

ALMA: a mathematics-driven approach for determining tuning parameters in generalized LASSO problems, with applications to … 15 hours ago | arxiv.org

abstract acquisition applications artifacts +19

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions 15 hours ago | arxiv.org

abstract agents arxiv cs.ai +21

AI Focused Biochemistry Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

View on ai-jobs.net

Senior Quality Specialist - JAVA

@ SAP | Bengaluru, IN, 560066

View on ai-jobs.net

Aktuar Financial Lines (m/w/d)

@ Zurich Insurance | Köln, DE

View on ai-jobs.net

Senior Network Engineer

@ ManTech | 054H - 124TchnlgyPrkWy,SBurlington,VT

View on ai-jobs.net

Pricing Analyst

@ EDF | Exeter, GB

View on ai-jobs.net

Specialist IS Engineer

@ Amgen | US - California - Thousand Oaks - Field/Remote

View on ai-jobs.net