all AI news
Ghost Sentence: A Tool for Everyday Users to Copyright Data from Large Language Models
March 26, 2024, 4:43 a.m. | Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang
cs.LG updates on arXiv.org arxiv.org
Abstract: Web user data plays a central role in the ecosystem of pre-trained large language models (LLMs) and their fine-tuned variants. Billions of data are crawled from the web and fed to LLMs. How can \textit{\textbf{everyday web users}} confirm if LLMs misuse their data without permission? In this work, we suggest that users repeatedly insert personal passphrases into their documents, enabling LLMs to memorize them. These concealed passphrases in user documents, referred to as \textit{ghost sentences}, …
abstract arxiv copyright cs.cl cs.cr cs.ir cs.lg data ecosystem fed ghost language language models large language large language models llms misuse role tool type user data variants web
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)
@ takealot.com | Cape Town