May 5, 2024, 5:57 a.m. | /u/SurveySea7570

Machine Learning www.reddit.com

**Paper**: [https://arxiv.org/abs/2405.00332](https://arxiv.org/abs/2405.00332)

**Abstract**:

>Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission ***Grade School Math 1000*** (**GSM1k**). GSM1k is designed to mirror the style and complexity of the established GSM8k benchmark, the gold standard for measuring elementary mathematical …

abstract benchmark benchmarks claim commission data dataset however language language models large language large language models leaks llms machinelearning math mathematical reasoning performance questions reasoning school success training training data true

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US