Feb. 12, 2024, 5:43 a.m. | Xiaoxuan Wang Ziniu Hu Pan Lu Yanqiao Zhu Jieyu Zhang Satyen Subramaniam Arjun R. Loomba Shich

cs.LG updates on arXiv.org arxiv.org

Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations. To systematically examine the reasoning capabilities required for solving complex scientific problems, we introduce an expansive benchmark suite SciBench for LLMs. SciBench contains a carefully curated dataset featuring a range of collegiate-level scientific problems from mathematics, chemistry, and physics domains. Based on the dataset, we conduct an in-depth benchmarking study of representative open-source …

benchmark benchmarks capabilities college cs.ai cs.cl cs.lg elementary focus high-school language language model language models large language large language model large language models llm llms operations problem-solving reasoning school

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US