Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective | allainews.com

May 1, 2024, 4:45 a.m. | Wanqi Zhou, Shuanghao Bai, Qibin Zhao, Badong Chen

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.19287v1 Announce Type: new
Abstract: Pretrained vision-language models (VLMs) like CLIP have shown impressive generalization performance across various downstream tasks, yet they remain vulnerable to adversarial attacks. While prior research has primarily concentrated on improving the adversarial robustness of image encoders to guard against attacks on images, the exploration of text-based and multimodal attacks has largely been overlooked. In this work, we initiate the first known and comprehensive effort to study adapting vision-language models for adversarial robustness under the multimodal …

abstract adversarial adversarial attacks arxiv attacks clip cs.cv exploration image images improving language language models multimodal performance perspective prior research robustness tasks type vision vision-language vision-language models vlms vulnerable while

More from arxiv.org / cs.CV updates on arXiv.org

CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images 13 hours ago | arxiv.org

arxiv center cs.cv dataset +10

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering 13 hours ago | arxiv.org

abstract agent arxiv augment +16

SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars 13 hours ago | arxiv.org

abstract arxiv association cs.cv +18

On Partial Shape Correspondence and Functional Maps 13 hours ago | arxiv.org

abstract apply arxiv cs.cv +10

Hierarchical Side-Tuning for Vision Transformers 13 hours ago | arxiv.org

abstract arxiv challenge computational +18

DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models 13 hours ago | arxiv.org

animation arxiv cs.cv cs.gr +7

Local Padding in Patch-Based GANs for Seamless Infinite-Sized Texture Synthesis 13 hours ago | arxiv.org

arxiv cs.cv eess.iv gans +5

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition 13 hours ago | arxiv.org

abstract action recognition applications arxiv +21

Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation 13 hours ago | arxiv.org

arxiv counterfactual cs.cv eess.iv +7

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net