Feb. 7, 2024, 5:48 a.m. | Luk\'a\v{s} Mikula Michal \v{S}tef\'anik Marek Petrovi\v{c} Petr Sojka

cs.CL updates on arXiv.org arxiv.org

While the Large Language Models (LLMs) dominate a majority of language understanding tasks, previous work shows that some of these results are supported by modelling spurious correlations of training datasets. Authors commonly assess model robustness by evaluating their models on out-of-distribution (OOD) datasets of the same task, but these datasets might share the bias of the training dataset.
We propose a simple method for measuring a scale of models' reliance on any identified spurious feature and assess the robustness towards …

authors correlations cs.ai cs.cl datasets distribution efficiency language language models language understanding large language large language models llms measuring modelling model robustness prediction question question answering robustness shows tasks think training understanding work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist, Demography and Survey Science, University Grad

@ Meta | Menlo Park, CA | New York City

Computer Vision Engineer, XR

@ Meta | Burlingame, CA