Feb. 12, 2024, 5:43 a.m. | Xiaoxuan Wang Ziniu Hu Pan Lu Yanqiao Zhu Jieyu Zhang Satyen Subramaniam Arjun R. Loomba Shich

cs.LG updates on arXiv.org arxiv.org

Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning capabilities required for solving complex scientific problems, we introduce an expansive benchmark suite SciBench for LLMs. SciBench contains a carefully curated dataset featuring a range of collegiate-level scientific problems from mathematics, chemistry, and physics domains. Based on the dataset, we conduct an in-depth benchmarking study of representative open-source …

benchmark benchmarks capabilities college cs.ai cs.cl cs.lg elementary focus high-school language language model language models large language large language model large language models llm llms operations problem-solving reasoning school

Research Scholar (Technical Research)

@ Centre for the Governance of AI | Hybrid; Oxford, UK

HPC Engineer (x/f/m) - DACH

@ Meshcapade GmbH | Remote, Germany

Data Engineering Director-Big Data technologies (Hadoop, Spark, Hive, Kafka)

@ Visa | Bengaluru, India

Senior Data Engineer

@ Manulife | Makati City, Manulife Philippines Head Office

GDS Consulting Senior Data Scientist 2

@ EY | Taguig, PH, 1634

IT Data Analyst Team Lead

@ Rosecrance | Rockford, Illinois, United States