March 8, 2024, 5:48 a.m. | Rui Wang, Hongru Wang, Fei Mi, Yi Chen, Boyang Xue, Kam-Fai Wong, Ruifeng Xu

cs.CL updates on arXiv.org arxiv.org

arXiv:2305.13733v2 Announce Type: replace
Abstract: Numerous works are proposed to align large language models (LLMs) with human intents to better fulfill instructions, ensuring they are trustful and helpful. Nevertheless, some human instructions are often malicious or misleading and following them will lead to untruthful and unsafe responses. Previous work rarely focused on understanding how LLMs manage instructions based on counterfactual premises, referred to here as \textit{inductive instructions}, which may stem from users' false beliefs or malicious intents. In this paper, …

abstract arxiv critique cs.cl human inductive language language models large language large language models llms prompting responses them type will work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Data Engineering Manager

@ Microsoft | Redmond, Washington, United States

Machine Learning Engineer

@ Apple | San Diego, California, United States