[Research] Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks | allainews.com

Feb. 1, 2024, 10:45 p.m. | /u/SatisfyingLatte

Machine Learning www.reddit.com

Paper: [https://arxiv.org/abs/2401.17263](https://arxiv.org/abs/2401.17263)

Abstract: Despite advances in AI alignment, language models (LM) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries modify input prompts to induce harmful behavior. While some defenses have been proposed, they focus on narrow threat models and fall short of a strong defense, which we posit should be effective, universal, and practical. To achieve this, we propose the first adversarial objective for defending LMs against jailbreaking attacks and an algorithm, robust prompt optimization (RPO), that uses …

abstract advances adversarial adversarial attacks ai alignment alignment attacks behavior defense focus jailbreaking language language models machinelearning narrow posit practical prompts threat vulnerable

More from www.reddit.com / Machine Learning

[Project] Tabletop HandyBot: low-cost robotic arm assistant for tabletop tasks 13 hours ago | www.reddit.com

arm assistant cost functional +9

[R] Grounding DINO 1.5 Release: the most capable open-set detection model 13 hours ago | www.reddit.com

building dataset detection foundation +12

[D] Foundational Time Series Models Overrated? 13 hours ago | www.reddit.com

chronos domain etc example +13

[project] YOLOv8 quantized in INT8 13 hours ago | www.reddit.com

fps github jetson jetson orin +5

[R] Do Llamas Work in English? On the Latent Language of Multilingual Transformers 14 hours ago | www.reddit.com

abstract bias colab english +19

[R] Robust agents learn causal world models 14 hours ago | www.reddit.com

abstract agent agents biases +14

[D] Library for named entity recognition 14 hours ago | www.reddit.com

library machinelearning mean recognition +3

[N] ICML 2024 Workshop on making discrete operations differentiable 🤖 16 hours ago | www.reddit.com

clustering deep learning differentiable everything +12

[P] GPT-Burn: A simple & concise implementation of the GPT in pure Rust 🔥 21 hours ago | www.reddit.com

gpt implementation machinelearning rust +1

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net