Jan. 1, 2024, 10:12 a.m. | /u/Radiant_Routine_3183

Machine Learning www.reddit.com

Greetings.

Over the New Year holiday, inspired by the paper from [here](https://www.semanticscholar.org/paper/Did-Aristotle-Use-a-Laptop-A-Question-Answering-Geva-Khashabi/346081161bdc8f18e2a4c4af7f51d35452b5cb01), I tried to evaluate the OpenAI models across various datasets, including StrategyQA.

In short, this dataset contains many questions about multi-step reasoning and common sense. Here's an example:

{
"qid": "e1f10b57579fa6a92aa9",
"term": "Martin Luther",
"description": "Saxon priest, monk and theologian, seminal figure in Protestant Reformation",
"question": "Did Martin Luther believe in Satan?",
"answer": true,
"facts": [
"Martin Luther was a Protestant.",
"Satan is also known as the devil.", …

common sense dataset errors example facts figure machinelearning question questions reasoning sense thought true

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote