all AI news
Unlearning Traces the Influential Training Data of Language Models
June 14, 2024, 4:42 a.m. | Masaru Isonuma, Ivan Titov
cs.CL updates on arXiv.org arxiv.org
Abstract: Identifying the training datasets that influence a language model's outputs is essential for minimizing the generation of harmful content and enhancing its performance. Ideally, we can measure the influence of each dataset by removing it from training; however, it is prohibitively expensive to retrain a model multiple times. This paper presents UnTrac: unlearning traces the influence of a training dataset on the model's performance. UnTrac is extremely simple; each training dataset is unlearned by gradient …
abstract arxiv cs.ai cs.cl data dataset datasets however influence language language model language models performance replace retrain traces training training data training datasets type unlearning
More from arxiv.org / cs.CL updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Data Engineer
@ Displate | Warsaw
Senior Principal Software Engineer
@ Oracle | Columbia, MD, United States
Software Engineer for Manta Systems
@ PXGEO | Linköping, Östergötland County, Sweden
DevOps Engineer
@ Teradyne | Odense, DK
LIDAR System Engineer Trainee
@ Valeo | PRAGUE - PRA2
Business Applications Administrator
@ Allegro | Poznań, Poland