all AI news
AI chatbot fooled into revealing harmful content with 98 percent success rate
Dec. 12, 2023, 10:52 a.m. | /u/NuseAI
Artificial Intelligence www.reddit.com
- The method involves exploiting the probability data related to prompt responses in large language models (LLMs) to coerce the models into generating toxic answers.
- The researchers found that even open source LLMs and commercial LLM APIs that offer soft label information are vulnerable to this coercive interrogation.
- They warn that …
ai chatbot ai chatbots artificial chatbot chatbots data language language models large language large language models lint llm llms probability prompt purdue university rate researchers responses success trick university
More from www.reddit.com / Artificial Intelligence
Instagram Co-Founder Joins Anthropic
1 day, 4 hours ago |
www.reddit.com
OpenAI’s Long-Term AI Risk Team Has Disbanded
1 day, 14 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Software Engineer for AI Training Data (School Specific)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Python)
@ G2i Inc | Remote
Software Engineer for AI Training Data (Tier 2)
@ G2i Inc | Remote
Data Engineer
@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US