all AI news
Towards Building a Robust Toxicity Predictor
April 16, 2024, 4:42 a.m. | Dmitriy Bespalov, Sourav Bhabesh, Yi Xiang, Liutong Zhou, Yanjun Qi
cs.LG updates on arXiv.org arxiv.org
Abstract: Recent NLP literature pays little attention to the robustness of toxicity language predictors, while these systems are most likely to be used in adversarial contexts. This paper presents a novel adversarial attack, \texttt{ToxicTrap}, introducing small word-level perturbations to fool SOTA text classifiers to predict toxic text samples as benign. ToxicTrap exploits greedy based search strategies to enable fast and effective generation of toxic adversarial examples. Two novel goal function designs allow ToxicTrap to identify weaknesses …
abstract adversarial arxiv attention building classifiers cs.ai cs.cl cs.cr cs.lg language literature nlp novel paper robust robustness samples small sota systems text toxicity type word
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
C003549 Data Analyst (NS) - MON 13 May
@ EMW, Inc. | Braine-l'Alleud, Wallonia, Belgium
Marketing Decision Scientist
@ Meta | Menlo Park, CA | New York City