Feb. 15, 2024, 5:46 a.m. | Matthieu Meeus, Igor Shilov, Manuel Faysse, Yves-Alexandre de Montjoye

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.09363v1 Announce Type: new
Abstract: Questions of fair use of copyright-protected content to train Large Language Models (LLMs) are being very actively debated. Document-level inference has been proposed as a new task: inferring from black-box access to the trained model whether a piece of content has been seen during training. SOTA methods however rely on naturally occurring memorization of (part of) the content. While very effective against models that memorize a lot, we hypothesize--and later confirm--that they will not work …

abstract arxiv box copyright cs.cl cs.cr document fair fair use inference language language models large language large language models llms questions sota train training type

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US