May 2, 2024, 5:36 p.m. | /u/kei147

Machine Learning www.reddit.com

There's been a lot of discussion about benchmark contamination, where models are trained on the data they are ultimately evaluated on. For example, a [recent paper](https://twitter.com/hughbzhang/status/1785877026794356858) showed that models performed substantially better on the public GSM8K vs GSM1K, which was a benchmark recently created by Scale AI to match GSM8K on difficulty and other measures.

Because of these concerns about benchmark contamination, it is often hard to take a research lab's claims about model performance at face value. It's difficult …

benchmark benchmarks concerns data face good lab machinelearning performance pre-training research solution training training data value

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US