Dec. 12, 2023, 10:52 a.m. | /u/NuseAI

Artificial Intelligence www.reddit.com

- Researchers at Purdue University have developed a technique called LINT (LLM Interrogation) to trick AI chatbots into revealing harmful content with a 98 percent success rate.

- The method involves exploiting the probability data related to prompt responses in large language models (LLMs) to coerce the models into generating toxic answers.

- The researchers found that even open source LLMs and commercial LLM APIs that offer soft label information are vulnerable to this coercive interrogation.

- They warn that …

ai chatbot ai chatbots artificial chatbot chatbots data language language models large language large language models lint llm llms probability prompt purdue university rate researchers responses success trick university

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US