Nov. 22, 2023, 11:58 a.m. | /u/APaperADay

Machine Learning www.reddit.com

**Paper**: [https://arxiv.org/abs/2311.12022](https://arxiv.org/abs/2311.12022)

**Code and data**: [https://github.com/idavidrein/gpqa/](https://github.com/idavidrein/gpqa/)

**Abstract**:

>We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to …

abstract accuracy biology chemistry clear dataset domain domain experts domains expert experts machinelearning mistakes multiple physics quality questions skilled

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York