May 12, 2022, 1:11 a.m. | Ben Hutchinson, Negar Rostamzadeh, Christina Greer, Katherine Heller, Vinodkumar Prabhakaran

cs.LG updates on arXiv.org arxiv.org

Forming a reliable judgement of a machine learning (ML) model's
appropriateness for an application ecosystem is critical for its responsible
use, and requires considering a broad range of factors including harms,
benefits, and responsibilities. In practice, however, evaluations of ML models
frequently focus on only a narrow range of decontextualized predictive
behaviours. We examine the evaluation gaps between the idealized breadth of
evaluation concerns and the observed narrow focus of actual evaluations.
Through an empirical study of papers from recent …

