March 20, 2024, 4:48 a.m. | Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio, Paul R\"ottger, Dan Jurafsky, Tatsunori Hashimoto, James Zou

cs.CL updates on arXiv.org arxiv.org

arXiv:2309.07875v3 Announce Type: replace
Abstract: Training large language models to follow instructions makes them perform better on a wide range of tasks and generally become more helpful. However, a perfectly helpful model will follow even the most malicious instructions and readily generate harmful content. In this paper, we raise concerns over the safety of models that only emphasize helpfulness, not harmlessness, in their instruction-tuning. We show that several popular instruction-tuned models are highly unsafe. Moreover, we show that adding just …

abstract arxiv become cs.cl generate however language language models large language large language models llamas safety tasks them training type will

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US