all AI news
CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
April 2, 2024, 7:52 p.m. | Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, Carolyn Rose
cs.CL updates on arXiv.org arxiv.org
Abstract: To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples involving 293 libraries revised from code in …
abstract arxiv benchmarks code code generation cs.cl cs.se diverse evaluation example framework guidance humans language language model large language large language model light llm scalable systems test type
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Alternance DATA/AI Engineer (H/F)
@ SQLI | Le Grand-Quevilly, France