March 29, 2024, 4:41 a.m. | Saeid Asgari Taghanaki, Joseph Lambourne

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.19050v1 Announce Type: new
Abstract: The advent of generative AI models has revolutionized digital content creation, yet it introduces challenges in maintaining copyright integrity due to generative parroting, where models mimic their training data too closely. Our research presents a novel approach to tackle this issue by employing an overfitted Masked Autoencoder (MAE) to detect such parroted samples effectively. We establish a detection threshold based on the mean loss across the training dataset, allowing for the precise identification of parroted …

abstract ai models arxiv autoencoder autoencoders challenges copyright cs.ai cs.lg data digital digital content generative generative ai models integrity issue masked autoencoder novel overfitting research through training training data type

Data Scientist (m/f/x/d)

@ Symanto Research GmbH & Co. KG | Spain, Germany

NUSolve Innovation Assistant/Associate in Data Science'

@ Newcastle University | Newcastle, GB

Data Engineer (Snowflake)

@ Unit4 | Lisbon, Portugal

Lead Data Engineer

@ Provident Bank | Woodbridge, NJ, US

Specialist Solutions Engineer (Data Science/Machine Learning)

@ Databricks | London, United Kingdom

Staff Software Engineer, Data Mirgrations

@ Okta | Canada