April 4, 2024, 4:48 a.m. | Ashim Gupta, Rishanth Rajendhran, Nathan Stringham, Vivek Srikumar, Ana Marasovi\'c

cs.CL updates on arXiv.org arxiv.org

arXiv:2311.09694v2 Announce Type: replace
Abstract: Do larger and more performant models resolve NLP's longstanding robustness issues? We investigate this question using over 20 models of different sizes spanning different architectural choices and pretraining objectives. We conduct evaluations using (a) out-of-domain and challenge test sets, (b) behavioral testing with CheckLists, (c) contrast sets, and (d) adversarial inputs. Our analysis reveals that not all out-of-domain tests provide insight into robustness. Evaluating with CheckLists and contrast sets shows significant gaps in model performance; …

abstract arxiv challenge cs.cl domain nlp pretraining question robustness test testing type

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Snowflake Analytics Engineer - Technology Sector

@ Winning | Lisbon, Lisbon

Business Data Analyst

@ RideCo | Waterloo, Ontario, Canada

Senior Data Scientist, Payment Risk

@ Block | Boston, MA, United States

Research Scientist, Data Fusion (Climate TRACE)

@ WattTime | Remote

Technical Analyst (Data Analytics)

@ Contact Government Services | Fayetteville, AR