March 25, 2024, 4:42 a.m. | Jonathan Katzy, R\u{a}zvan-Mihai Popescu, Arie van Deursen, Maliheh Izadi

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.15230v1 Announce Type: cross
Abstract: Does the training of large language models potentially infringe upon code licenses? Furthermore, are there any datasets available that can be safely used for training these models without violating such licenses? In our study, we assess the current trends in the field and the importance of incorporating code into the training of large language models. Additionally, we examine publicly available datasets to see whether these models can be trained on them without the risk of …

abstract arxiv code cs.lg cs.se current datasets exploratory investigation language language model language models language model training large language large language model large language models license study training training datasets trends type

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US