Feb. 12, 2024, 5:43 a.m. | Xiaoxuan Wang Ziniu Hu Pan Lu Yanqiao Zhu Jieyu Zhang Satyen Subramaniam Arjun R. Loomba Shich

cs.LG updates on arXiv.org arxiv.org

Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning capabilities required for solving complex scientific problems, we introduce an expansive benchmark suite SciBench for LLMs. SciBench contains a carefully curated dataset featuring a range of collegiate-level scientific problems from mathematics, chemistry, and physics domains. Based on the dataset, we conduct an in-depth benchmarking study of representative open-source …

benchmark benchmarks capabilities college cs.ai cs.cl cs.lg elementary focus high-school language language model language models large language large language model large language models llm llms operations problem-solving reasoning school

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne