all AI news
Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code
Feb. 15, 2024, 5:43 a.m. | Vahid Majdinasab, Amin Nikanjam, Foutse Khomh
cs.LG updates on arXiv.org arxiv.org
Abstract: Code auditing ensures that the developed code adheres to standards, regulations, and copyright protection by verifying that it does not contain code from protected sources. The recent advent of Large Language Models (LLMs) as coding assistants in the software development process poses new challenges for code auditing. The dataset for training these models is mainly collected from publicly available sources. This raises the issue of intellectual property infringement as developers' codes are already included in …
abstract arxiv assistants code coding consent copyright copyright protection cs.lg cs.se development inclusion language language models large language large language models llms process protection regulations software software development standards type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Applied Scientist
@ Microsoft | Redmond, Washington, United States
Data Analyst / Action Officer
@ OASYS, INC. | OASYS, INC., Pratt Avenue Northwest, Huntsville, AL, United States