all AI news
Don't be a Fool: Pooling Strategies in Offensive Language Detection from User-Intended Adversarial Attacks
March 26, 2024, 4:50 a.m. | Seunguk Yu, Juhwan Choi, Youngbin Kim
cs.CL updates on arXiv.org arxiv.org
Abstract: Offensive language detection is an important task for filtering out abusive expressions and improving online user experiences. However, malicious users often attempt to avoid filtering systems through the involvement of textual noises. In this paper, we propose these evasions as user-intended adversarial attacks that insert special symbols or leverage the distinctive features of the Korean language. Furthermore, we introduce simple yet effective pooling strategies in a layer-wise manner to defend against the proposed attacks, focusing …
abstract adversarial adversarial attacks arxiv attacks cs.cl detection filtering however improving language paper pooling strategies systems textual through type
More from arxiv.org / cs.CL updates on arXiv.org
Benchmarking LLMs via Uncertainty Quantification
2 days, 13 hours ago |
arxiv.org
CARE: Extracting Experimental Findings From Clinical Literature
2 days, 13 hours ago |
arxiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
RL Analytics - Content, Data Science Manager
@ Meta | Burlingame, CA
Research Engineer
@ BASF | Houston, TX, US, 77079