all AI news
Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
March 11, 2024, 4:41 a.m. | Martin Riddell, Ansong Ni, Arman Cohan
cs.LG updates on arXiv.org arxiv.org
Abstract: While large language models have achieved remarkable performance on various code generation benchmarks, there have been growing concerns regarding potential contamination of these benchmarks as they may be leaked into pretraining and finetuning data. While recent work has investigated contamination in natural language generation and understanding tasks, there has been less extensive research into how data contamination impacts the evaluation of code generation, which is critical for understanding the robustness and reliability of LLMs in …
abstract arxiv benchmarks capabilities code code generation concerns cs.cl cs.lg cs.se data finetuning language language generation language models large language large language models leaked natural natural language natural language generation performance pretraining type work
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Lead Data Modeler
@ Sherwin-Williams | Cleveland, OH, United States