Feb. 15, 2024, 5:42 a.m. | Yixin Cheng, Markos Georgopoulos, Volkan Cevher, Grigorios G. Chrysos

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.09177v1 Announce Type: new
Abstract: Large Language Models (LLMs) are susceptible to Jailbreaking attacks, which aim to extract harmful information by subtly modifying the attack query. As defense mechanisms evolve, directly obtaining harmful information becomes increasingly challenging for Jailbreaking attacks. In this work, inspired by human practices of indirect context to elicit harmful information, we focus on a new attack form called Contextual Interaction Attack. The idea relies on the autoregressive nature of the generation process in LLMs. We contend …

abstract aim arxiv attacks context cs.ai cs.cl cs.lg defense extract human information interactions jailbreaking language language models large language large language models llms practices query through type work

