all AI news
Korean Online Hate Speech Dataset for Multilabel Classification: How Can Social Science Aid Developing Better Hate Speech Dataset?. (arXiv:2204.03262v1 [cs.CL])
cs.CL updates on arXiv.org arxiv.org
We suggest a multilabel Korean online hate speech dataset that covers seven
categories of hate speech: (1) Race and Nationality, (2) Religion, (3)
Regionalism, (4) Ageism, (5) Misogyny, (6) Sexual Minorities, and (7) Male. Our
35K dataset consists of 24K online comments with Krippendorff's Alpha label
accordance of .713, 2.2K neutral sentences from Wikipedia, 1.7K additionally
labeled sentences generated by the Human-in-the-Loop procedure and
rule-generated 7.1K neutral sentences. The base model with 24K initial dataset
achieved the accuracy of LRAP …
arxiv classification dataset hate speech science social social science speech