March 20, 2024, 4:45 a.m. | Sensen Gao, Xiaojun Jia, Xuhong Ren, Ivor Tsang, Qing Guo

cs.CV updates on arXiv.org arxiv.org

arXiv:2403.12445v1 Announce Type: new
Abstract: Vision-language pre-training (VLP) models exhibit remarkable capabilities in comprehending both images and text, yet they remain susceptible to multimodal adversarial examples (AEs). Strengthening adversarial attacks and uncovering vulnerabilities, especially common issues in VLP models (e.g., high transferable AEs), can stimulate further research on constructing reliable and practical VLP models. A recent work (i.e., Set-level guidance attack) indicates that augmenting image-text pairs to increase AE diversity along the optimization path enhances the transferability of adversarial examples …

abstract adversarial adversarial attacks adversarial examples arxiv attacks boosting capabilities cs.cv diversification examples images intersection language multimodal pre-training text training trajectory type via vision vulnerabilities

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

RL Analytics - Content, Data Science Manager

@ Meta | Burlingame, CA

Research Engineer

@ BASF | Houston, TX, US, 77079