May 1, 2024, 4:45 a.m. | Wanqi Zhou, Shuanghao Bai, Qibin Zhao, Badong Chen

cs.CV updates on arXiv.org arxiv.org

arXiv:2404.19287v1 Announce Type: new
Abstract: Pretrained vision-language models (VLMs) like CLIP have shown impressive generalization performance across various downstream tasks, yet they remain vulnerable to adversarial attacks. While prior research has primarily concentrated on improving the adversarial robustness of image encoders to guard against attacks on images, the exploration of text-based and multimodal attacks has largely been overlooked. In this work, we initiate the first known and comprehensive effort to study adapting vision-language models for adversarial robustness under the multimodal …

abstract adversarial adversarial attacks arxiv attacks clip cs.cv exploration image images improving language language models multimodal performance perspective prior research robustness tasks type vision vision-language vision-language models vlms vulnerable while

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York