AI chatbot fooled into revealing harmful content with 98 percent success rate | allainews.com

Dec. 12, 2023, 10:52 a.m. | /u/NuseAI

Artificial Intelligence www.reddit.com

- Researchers at Purdue University have developed a technique called LINT (LLM Interrogation) to trick AI chatbots into revealing harmful content with a 98 percent success rate.

- The method involves exploiting the probability data related to prompt responses in large language models (LLMs) to coerce the models into generating toxic answers.

- The researchers found that even open source LLMs and commercial LLM APIs that offer soft label information are vulnerable to this coercive interrogation.

- They warn that …

ai chatbot ai chatbots artificial chatbot chatbots data language language models large language large language models lint llm llms probability prompt purdue university rate researchers responses success trick university

More from www.reddit.com / Artificial Intelligence

WSJ post: AI and Law Professor’s Search for Rare Recordings Resurrects Voices of Landmark Segregation … 14 hours ago | www.reddit.com

ai and law artificial case landmark +5

Researchers Train AI Doctors In Hospital Simulation 20 hours ago | www.reddit.com

agent ai research artificial china +15

Instagram Co-Founder Joins Anthropic 1 day, 4 hours ago | www.reddit.com

anthropic artificial co-founder founder +2

GPT-4o Math Demo With the API 1 day, 5 hours ago | www.reddit.com

api artificial demo gpt +2

Open source chrome extension to discover API behaviour with LLM descriptions 1 day, 12 hours ago | www.reddit.com

api artificial chrome chrome extension +3

I conducted this interview with the late Daniel Dennett in Morocco. Some of his final … 1 day, 12 hours ago | www.reddit.com

artificial cognitive daniel interview +6

OpenAI’s Long-Term AI Risk Team Has Disbanded 1 day, 14 hours ago | www.reddit.com

artificial long-term openai risk +1

CAT3D: Create Anything in 3D with Multi-View Diffusion Models 1 day, 15 hours ago | www.reddit.com

artificial create diffusion diffusion models +1

Reddit’s deal with OpenAI will plug its posts into “ChatGPT and new products” 2 days ago | www.reddit.com

artificial chatgpt deal openai +3

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net