March 27, 2024, 4:48 a.m. | Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shuming Shi, Zhaopeng Tu

cs.CL updates on arXiv.org arxiv.org

arXiv:2308.06463v2 Announce Type: replace
Abstract: Safety lies at the core of the development of Large Language Models (LLMs). There is ample work on aligning LLMs with human ethics and preferences, including data filtering in pretraining, supervised fine-tuning, reinforcement learning from human feedback, and red teaming, etc. In this study, we discover that chat in cipher can bypass the safety alignment techniques of LLMs, which are mainly conducted in natural languages. We propose a novel framework CipherChat to systematically examine the …

arxiv chat cipher cs.cl gpt gpt-4 llms smart type via

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Data Analyst

@ S&P Global | IN - HYDERABAD SKYVIEW

EY GDS Internship Program - Junior Data Visualization Engineer (June - July 2024)

@ EY | Wrocław, DS, PL, 50-086

Staff Data Scientist

@ ServiceTitan | INT Armenia Yerevan

Master thesis on deterministic AI inference on-board Telecom Satellites

@ Airbus | Taufkirchen / Ottobrunn

Lead Data Scientist

@ Picket | Seattle, WA