March 11, 2024, 4:41 a.m. | Martin Riddell, Ansong Ni, Arman Cohan

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.04811v1 Announce Type: cross
Abstract: While large language models have achieved remarkable performance on various code generation benchmarks, there have been growing concerns regarding potential contamination of these benchmarks as they may be leaked into pretraining and finetuning data. While recent work has investigated contamination in natural language generation and understanding tasks, there has been less extensive research into how data contamination impacts the evaluation of code generation, which is critical for understanding the robustness and reliability of LLMs in …

abstract arxiv benchmarks capabilities code code generation concerns cs.cl cs.lg cs.se data finetuning language language generation language models large language large language models leaked natural natural language natural language generation performance pretraining type work

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Lead Data Modeler

@ Sherwin-Williams | Cleveland, OH, United States