all AI news
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions
March 20, 2024, 4:48 a.m. | Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio, Paul R\"ottger, Dan Jurafsky, Tatsunori Hashimoto, James Zou
cs.CL updates on arXiv.org arxiv.org
Abstract: Training large language models to follow instructions makes them perform better on a wide range of tasks and generally become more helpful. However, a perfectly helpful model will follow even the most malicious instructions and readily generate harmful content. In this paper, we raise concerns over the safety of models that only emphasize helpfulness, not harmlessness, in their instruction-tuning. We show that several popular instruction-tuned models are highly unsafe. Moreover, we show that adding just …
abstract arxiv become cs.cl generate however language language models large language large language models llamas safety tasks them training type will
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US