Feb. 15, 2024, 5:42 a.m. | Yixin Cheng, Markos Georgopoulos, Volkan Cevher, Grigorios G. Chrysos

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.09177v1 Announce Type: new
Abstract: Large Language Models (LLMs) are susceptible to Jailbreaking attacks, which aim to extract harmful information by subtly modifying the attack query. As defense mechanisms evolve, directly obtaining harmful information becomes increasingly challenging for Jailbreaking attacks. In this work, inspired by human practices of indirect context to elicit harmful information, we focus on a new attack form called Contextual Interaction Attack. The idea relies on the autoregressive nature of the generation process in LLMs. We contend …

abstract aim arxiv attacks context cs.ai cs.cl cs.lg defense extract human information interactions jailbreaking language language models large language large language models llms practices query through type work

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Business Intelligence Analyst Insights & Reporting

@ Bertelsmann | Hilversum, NH, NL, 1217WP