June 14, 2024, 4:42 a.m. | Masaru Isonuma, Ivan Titov

cs.CL updates on arXiv.org arxiv.org

arXiv:2401.15241v2 Announce Type: replace
Abstract: Identifying the training datasets that influence a language model's outputs is essential for minimizing the generation of harmful content and enhancing its performance. Ideally, we can measure the influence of each dataset by removing it from training; however, it is prohibitively expensive to retrain a model multiple times. This paper presents UnTrac: unlearning traces the influence of a training dataset on the model's performance. UnTrac is extremely simple; each training dataset is unlearned by gradient …

abstract arxiv cs.ai cs.cl data dataset datasets however influence language language model language models performance replace retrain traces training training data training datasets type unlearning

Senior Data Engineer

@ Displate | Warsaw

Senior Principal Software Engineer

@ Oracle | Columbia, MD, United States

Software Engineer for Manta Systems

@ PXGEO | Linköping, Östergötland County, Sweden

DevOps Engineer

@ Teradyne | Odense, DK

LIDAR System Engineer Trainee

@ Valeo | PRAGUE - PRA2

Business Applications Administrator

@ Allegro | Poznań, Poland