March 29, 2024, 4:41 a.m. | Saeid Asgari Taghanaki, Joseph Lambourne

cs.LG updates on arXiv.org arxiv.org

arXiv:2403.19050v1 Announce Type: new
Abstract: The advent of generative AI models has revolutionized digital content creation, yet it introduces challenges in maintaining copyright integrity due to generative parroting, where models mimic their training data too closely. Our research presents a novel approach to tackle this issue by employing an overfitted Masked Autoencoder (MAE) to detect such parroted samples effectively. We establish a detection threshold based on the mean loss across the training dataset, allowing for the precise identification of parroted …

abstract ai models arxiv autoencoder autoencoders challenges copyright cs.ai cs.lg data digital digital content generative generative ai models integrity issue masked autoencoder novel overfitting research through training training data type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Sr. VBI Developer II

@ Atos | Texas, US, 75093

Wealth Management - Data Analytics Intern/Co-op Fall 2024

@ Scotiabank | Toronto, ON, CA