DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers | allainews.com

Feb. 28, 2024, 5:49 a.m. | Xirui Li, Ruochen Wang, Minhao Cheng, Tianyi Zhou, Cho-Jui Hsieh

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.16914v1 Announce Type: cross
Abstract: The safety alignment of Large Language Models (LLMs) is vulnerable to both manual and automated jailbreak attacks, which adversarially trigger LLMs to output harmful content. However, current methods for jailbreaking LLMs, which nest entire harmful prompts, are not effective at concealing malicious intent and can be easily identified and rejected by well-aligned LLMs. This paper discovers that decomposing a malicious prompt into separated sub-prompts can effectively obscure its underlying malicious intent by presenting it in …

abstract alignment arxiv attacks automated cs.ai cs.cl cs.cr current jailbreak jailbreaking language language models large language large language models llm llms prompt prompts safety type vulnerable

More from arxiv.org / cs.CL updates on arXiv.org

ChatDev: Communicative Agents for Software Development 18 hours ago | arxiv.org

agents arxiv chatdev communicative agents +8

Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions 18 hours ago | arxiv.org

abstract arxiv challenges cs.ai +18

JumpCoder: Go Beyond Autoregressive Coder via Online Modification 18 hours ago | arxiv.org

arxiv autoregressive beyond coder +6

Building Efficient and Effective OpenQA Systems for Low-Resource Languages 18 hours ago | arxiv.org

arxiv building cs.cl languages +4

WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning 18 hours ago | arxiv.org

abstract arxiv capabilities code +18

Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models 18 hours ago | arxiv.org

abstract art arxiv cs.ai +27

Uncertainty Estimation on Sequential Labeling via Uncertainty Transmission 18 hours ago | arxiv.org

arxiv cs.cl labeling replace +3

FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models 18 hours ago | arxiv.org

arxiv benchmark constraints cs.cl +7

PartialFormer: Modeling Part Instead of Whole for Machine Translation 18 hours ago | arxiv.org

arxiv cs.ai cs.cl machine +6

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Coding Data Quality Auditor

@ Neuberger Berman | Work At Home-Georgia

View on ai-jobs.net

Post Graduate (Year-Round) Intern - Market Research Analyst and Agreement Support

@ National Renewable Energy Laboratory | CO - Golden

View on ai-jobs.net

Retail Analytics Engineering - Sr. Manager (Data)

@ Axalta | Woonsocket-1 CVS Drive

View on ai-jobs.net