all AI news
Typographic Attacks in Large Multimodal Models Can be Alleviated by More Informative Prompts
March 1, 2024, 5:47 a.m. | Hao Cheng, Erjia Xiao, Renjing Xu
cs.CV updates on arXiv.org arxiv.org
Abstract: Large Multimodal Models (LMMs) rely on pre-trained Vision Language Models (VLMs) and Large Language Models (LLMs) to perform amazing emergent abilities on various multimodal tasks in the joint space of vision and language. However, the Typographic Attack, which shows disruption to VLMs, has also been certified as a security vulnerability to LMMs. In this work, we first comprehensively investigate the distractibility of LMMs by typography. In particular, we introduce the Typographic Dataset designed to evaluate …
abstract arxiv attacks cs.cv disruption language language models large language large language models large multimodal models llms lmms multimodal multimodal models prompts shows space tasks type vision vlms
More from arxiv.org / cs.CV updates on arXiv.org
Jobs in AI, ML, Big Data
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote