April 2, 2024, 7:52 p.m. | Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun Ma, Yu-Gang Jiang, Yu Qiao, Yingchun Wang

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.05915v3 Announce Type: replace
Abstract: The growing awareness of safety concerns in large language models (LLMs) has sparked considerable interest in the evaluation of safety. This study investigates an under-explored issue about the evaluation of LLMs, namely the substantial discrepancy in performance between multiple-choice questions and open-ended questions. Inspired by research on jailbreak attack patterns, we argue this is caused by mismatched generalization. That is, LLM only remembers the answer style for open-ended safety questions, which makes it unable to …

alignment arxiv cs.ai cs.cl fake llms type

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

Associate Data Analyst

@ Gartner | Stamford - 56 Top Gallant

Ecologist III (Wetland Scientist III)

@ AECOM | Pittsburgh, PA, United States

Senior Data Analyst

@ Publicis Groupe | Bengaluru, India

Data Analyst

@ Delivery Hero | Hong Kong, Hong Kong

Senior Data Engineer

@ ChargePoint | Bengaluru, India