A Comprehensive Study on NLP Data Augmentation for Hate Speech Detection: Legacy Methods, BERT, and LLMs | allainews.com

April 2, 2024, 7:51 p.m. | Md Saroar Jahan, Mourad Oussalah, Djamila Romaissa Beddia, Jhuma kabir Mim, Nabil Arhab

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.00303v1 Announce Type: new
Abstract: The surge of interest in data augmentation within the realm of NLP has been driven by the need to address challenges posed by hate speech domains, the dynamic nature of social media vocabulary, and the demands for large-scale neural networks requiring extensive training data. However, the prevalent use of lexical substitution in data augmentation has raised concerns, as it may inadvertently alter the intended meaning, thereby impacting the efficacy of supervised machine learning models. In …

abstract arxiv augmentation bert challenges cs.cl data detection domains dynamic hate speech hate speech detection llms media nature nlp realm scale social social media speech study type

More from arxiv.org / cs.CL updates on arXiv.org

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models an hour ago | arxiv.org

abstract arxiv become contents +17

Temporal Knowledge Question Answering via Abstract Reasoning Induction an hour ago | arxiv.org

abstract arxiv cs.ai cs.cl +8

Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning an hour ago | arxiv.org

abstract application arxiv capabilities +19

ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base an hour ago | arxiv.org

abstract arxiv cognitive cs.ai +23

FOLIO: Natural Language Reasoning with First-Order Logic an hour ago | arxiv.org

abstract arxiv benchmarks capabilities +21

Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks an hour ago | arxiv.org

arxiv attention attention mechanisms cs.cl +6

SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks an hour ago | arxiv.org

abstract arxiv capabilities communities +17

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers an hour ago | arxiv.org

abstract academia accessibility advances +28

COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain an hour ago | arxiv.org

abstract advanced art artificial +25

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

View on ai-jobs.net

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

View on ai-jobs.net

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net