all AI news
Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting
March 8, 2024, 5:48 a.m. | Rui Wang, Hongru Wang, Fei Mi, Yi Chen, Boyang Xue, Kam-Fai Wong, Ruifeng Xu
cs.CL updates on arXiv.org arxiv.org
Abstract: Numerous works are proposed to align large language models (LLMs) with human intents to better fulfill instructions, ensuring they are trustful and helpful. Nevertheless, some human instructions are often malicious or misleading and following them will lead to untruthful and unsafe responses. Previous work rarely focused on understanding how LLMs manage instructions based on counterfactual premises, referred to here as \textit{inductive instructions}, which may stem from users' false beliefs or malicious intents. In this paper, …
abstract arxiv critique cs.cl human inductive language language models large language large language models llms prompting responses them type will work
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
2 days, 2 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
2 days, 2 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Data Engineering Manager
@ Microsoft | Redmond, Washington, United States
Machine Learning Engineer
@ Apple | San Diego, California, United States