all AI news
Reward Modeling for Mitigating Toxicity in Transformer-based Language Models. (arXiv:2202.09662v2 [cs.CL] UPDATED)
cs.CL updates on arXiv.org arxiv.org
Transformer-based language models are able to generate fluent text and be
efficiently adapted across various natural language generation tasks. However,
language models that are pretrained on large unlabeled web text corpora have
been shown to suffer from degenerating toxic content and social bias behaviors,
consequently hindering their safe deployment. Various detoxification methods
were proposed to mitigate the language model's toxicity; however, these methods
struggled to detoxify language models when conditioned on prompts that contain
specific social identities related to gender, …
arxiv language language models modeling toxicity transformer