Web: http://arxiv.org/abs/2110.08420

June 16, 2022, 1:12 a.m. | Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta

cs.CL updates on arXiv.org arxiv.org

Estimating the difficulty of a dataset typically involves comparing
state-of-the-art models to humans; the bigger the performance gap, the harder
the dataset is said to be. However, this comparison provides little
understanding of how difficult each instance in a given distribution is, or
what attributes make the dataset difficult for a given model. To address these
questions, we frame dataset difficulty -- w.r.t. a model $\mathcal{V}$ -- as
the lack of $\mathcal{V}$-$\textit{usable information}$ (Xu et al., 2019),
where a lower …

arxiv dataset information

More from arxiv.org / cs.CL updates on arXiv.org

Machine Learning Researcher - Saalfeld Lab

@ Howard Hughes Medical Institute - Chevy Chase, MD | Ashburn, Virginia

Project Director, Machine Learning in US Health

@ ideas42.org | Remote, US

Data Science Intern

@ NannyML | Remote

Machine Learning Engineer NLP/Speech

@ Play.ht | Remote

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY