all AI news
An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets
March 25, 2024, 4:42 a.m. | Jonathan Katzy, R\u{a}zvan-Mihai Popescu, Arie van Deursen, Maliheh Izadi
cs.LG updates on arXiv.org arxiv.org
Abstract: Does the training of large language models potentially infringe upon code licenses? Furthermore, are there any datasets available that can be safely used for training these models without violating such licenses? In our study, we assess the current trends in the field and the importance of incorporating code into the training of large language models. Additionally, we examine publicly available datasets to see whether these models can be trained on them without the risk of …
abstract arxiv code cs.lg cs.se current datasets exploratory investigation language language model language models language model training large language large language model large language models license study training training datasets trends type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US