all AI news
Jailbreaking Proprietary Large Language Models using Word Substitution Cipher
Feb. 19, 2024, 5:47 a.m. | Divij Handa, Advait Chirmule, Bimal Gajera, Chitta Baral
cs.CL updates on arXiv.org arxiv.org
Abstract: Large Language Models (LLMs) are aligned to moral and ethical guidelines but remain susceptible to creative prompts called Jailbreak that can bypass the alignment process. However, most jailbreaking prompts contain harmful questions in the natural language (mainly English), which can be detected by the LLM themselves. In this paper, we present jailbreaking prompts encoded using cryptographic techniques. We first present a pilot study on the state-of-the-art LLM, GPT-4, in decoding several safe sentences that have …
abstract alignment arxiv cipher creative cs.ai cs.cl english ethical guidelines jailbreak jailbreaking language language models large language large language models llm llms natural natural language process prompts proprietary questions type word
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
2 days, 13 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
2 days, 13 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
RL Analytics - Content, Data Science Manager
@ Meta | Burlingame, CA
Research Engineer
@ BASF | Houston, TX, US, 77079