BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models | allainews.com

May 9, 2024, 4:42 a.m. | Chu Fei Luo, Ahmad Ghawanmeh, Xiaodan Zhu, Faiza Khan Khattak

cs.LG updates on arXiv.org arxiv.org

arXiv:2405.04756v1 Announce Type: cross
Abstract: Modern large language models (LLMs) have a significant amount of world knowledge, which enables strong performance in commonsense reasoning and knowledge-intensive tasks when harnessed properly. The language model can also learn social biases, which has a significant potential for societal harm. There have been many mitigation strategies proposed for LLM safety, but it is unclear how effective they are for eliminating social biases. In this work, we propose a new methodology for attacking language models …

abstract adversarial arxiv bias biases commonsense cs.cl cs.lg graphs harm knowledge knowledge graphs language language model language models large language large language models learn llms modern performance reasoning social tasks type world

More from arxiv.org / cs.LG updates on arXiv.org

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks 18 minutes ago | arxiv.org

arxiv attacks cs.ai cs.cl +7

Variational Mode Decomposition-Based Nonstationary Coherent Structure Analysis for Spatiotemporal Data 18 minutes ago | arxiv.org

abstract analysis and analysis arxiv +12

Differentially private projection-depth-based medians 18 minutes ago | arxiv.org

abstract arxiv cost cs.cr +19

Unified Binary and Multiclass Margin-Based Classification 18 minutes ago | arxiv.org

abstract algorithms analysis and analysis +15

An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits 18 minutes ago | arxiv.org

abstract arxiv causal causal inference +12

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space 18 minutes ago | arxiv.org

abstract advantages analysis arxiv +23

Identifying the Risks of LM Agents with an LM-Emulated Sandbox 18 minutes ago | arxiv.org

abstract advances agents amplify +22

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs 18 minutes ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

Robust Online Learning over Networks 18 minutes ago | arxiv.org

abstract agent agents arxiv +25

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net