Benchmark Transparency: Measuring the Impact of Data on Evaluation | allainews.com

April 2, 2024, 7:51 p.m. | Venelin Kovatchev, Matthew Lease

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.00748v1 Announce Type: new
Abstract: In this paper we present an exploratory research on quantifying the impact that data distribution has on the performance and evaluation of NLP models. We propose an automated framework that measures the data point distribution across 6 different dimensions: ambiguity, difficulty, discriminability, length, noise, and perplexity.
We use disproportional stratified sampling to measure how much the data distribution affects absolute (Acc/F1) and relative (Rank) model performance. We experiment on 2 different datasets (SQUAD and MNLI) …

abstract arxiv automated benchmark cs.ai cs.cl data dimensions distribution evaluation exploratory framework impact measuring nlp nlp models noise paper performance perplexity research transparency type

More from arxiv.org / cs.CL updates on arXiv.org

Sparse is Enough in Fine-tuning Pre-trained Large Language Models 22 hours ago | arxiv.org

arxiv cs.ai cs.cl cs.lg +6

On the Learnability of Watermarks for Language Models 22 hours ago | arxiv.org

abstract arxiv cs.cl cs.cr +17

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization 22 hours ago | arxiv.org

abstract arxiv capabilities cs.ai +14

Evaluating Generative Ad Hoc Information Retrieval 22 hours ago | arxiv.org

abstract advances arxiv cs.cl +19

Language Models As Semantic Indexers 22 hours ago | arxiv.org

arxiv cs.cl cs.ir cs.lg +4

Large language models can accurately predict searcher preferences 22 hours ago | arxiv.org

abstract arxiv cs.ai cs.cl +16

On the Reliability of Watermarks for Large Language Models 22 hours ago | arxiv.org

abstract arxiv become bots +28

A Watermark for Large Language Models 22 hours ago | arxiv.org

abstract arxiv cs.cl cs.cr +16

CreoleVal: Multilingual Multitask Benchmarks for Creoles 22 hours ago | arxiv.org

abstract annotated data arxiv benchmarks +14

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States

View on ai-jobs.net