June 27, 2024, 4:47 a.m. | Xunguang Wang, Zhenlan Ji, Pingchuan Ma, Zongjie Li, Shuai Wang

cs.CV updates on arXiv.org arxiv.org

arXiv:2312.01886v3 Announce Type: replace
Abstract: Large vision-language models (LVLMs) have demonstrated their incredible capability in image understanding and response generation. However, this rich visual interaction also makes LVLMs vulnerable to adversarial examples. In this paper, we formulate a novel and practical targeted attack scenario that the adversary can only know the vision encoder of the victim LVLM, without the knowledge of its prompts (which are often proprietary for service providers and not publicly available) and its underlying large language model …

arxiv cs.cv instruction-tuned language language models replace type vision vision-language vision-language models

VP, Enterprise Applications

@ Blue Yonder | Scottsdale

Data Scientist - Moloco Commerce Media

@ Moloco | Redwood City, California, United States

Senior Backend Engineer (New York)

@ Kalepa | New York City. Hybrid

Senior Backend Engineer (USA)

@ Kalepa | New York City. Remote US.

Senior Full Stack Engineer (USA)

@ Kalepa | New York City. Remote US.

Senior Full Stack Engineer (New York)

@ Kalepa | New York City., Hybrid