Web: http://arxiv.org/abs/2204.11790

May 5, 2022, 1:11 a.m. | Howard Chen, Jacqueline He, Karthik Narasimhan, Danqi Chen

cs.CL updates on arXiv.org arxiv.org

A growing line of work has investigated the development of neural NLP models
that can produce rationales--subsets of input that can explain their model
predictions. In this paper, we ask whether such rationale models can also
provide robustness to adversarial attacks in addition to their interpretable
nature. Since these models need to first generate rationales ("rationalizer")
before making predictions ("predictor"), they have the potential to ignore
noise or adversarially added text by simply masking it out of the generated
rationale. …

arxiv robustness

More from arxiv.org / cs.CL updates on arXiv.org

Director, Applied Mathematics & Computational Research Division

@ Lawrence Berkeley National Lab | Berkeley, Ca

Business Data Analyst

@ MainStreet Family Care | Birmingham, AL

Assistant/Associate Professor of the Practice in Business Analytics

@ Georgetown University McDonough School of Business | Washington DC

Senior Data Science Writer

@ NannyML | Remote

Director of AI/ML Engineering

@ Armis Industries | Remote (US only), St. Louis, California

Digital Analytics Manager

@ Patagonia | Ventura, California