Feb. 19, 2024, 5:47 a.m. | Divij Handa, Advait Chirmule, Bimal Gajera, Chitta Baral

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.10601v1 Announce Type: new
Abstract: Large Language Models (LLMs) are aligned to moral and ethical guidelines but remain susceptible to creative prompts called Jailbreak that can bypass the alignment process. However, most jailbreaking prompts contain harmful questions in the natural language (mainly English), which can be detected by the LLM themselves. In this paper, we present jailbreaking prompts encoded using cryptographic techniques. We first present a pilot study on the state-of-the-art LLM, GPT-4, in decoding several safe sentences that have …

abstract alignment arxiv cipher creative cs.ai cs.cl english ethical guidelines jailbreak jailbreaking language language models large language large language models llm llms natural natural language process prompts proprietary questions type word

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

RL Analytics - Content, Data Science Manager

@ Meta | Burlingame, CA

Research Engineer

@ BASF | Houston, TX, US, 77079