Web: http://arxiv.org/abs/2204.11790

May 5, 2022, 1:11 a.m. | Howard Chen, Jacqueline He, Karthik Narasimhan, Danqi Chen

cs.CL updates on arXiv.org arxiv.org

A growing line of work has investigated the development of neural NLP models
that can produce rationales--subsets of input that can explain their model
predictions. In this paper, we ask whether such rationale models can also
provide robustness to adversarial attacks in addition to their interpretable
nature. Since these models need to first generate rationales ("rationalizer")
before making predictions ("predictor"), they have the potential to ignore
noise or adversarially added text by simply masking it out of the generated
rationale. …

arxiv robustness

