Feb. 27, 2024, 5:50 a.m. | Yiping Song, Juhua Zhang, Zhiliang Tian, Yuxin Yang, Minlie Huang, Dongsheng Li

cs.CL updates on arXiv.org arxiv.org

arXiv:2402.16515v1 Announce Type: new
Abstract: As sufficient data are not always publically accessible for model training, researchers exploit limited data with advanced learning algorithms or expand the dataset via data augmentation (DA). Conducting DA in private domain requires private protection approaches (i.e. anonymization and perturbation), but those methods cannot provide protection guarantees. Differential privacy (DP) learning methods theoretically bound the protection but are not skilled at generating pseudo text samples with large models. In this paper, we transfer DP-based pseudo …

abstract advanced algorithms anonymization arxiv augmentation classification cs.cl cs.cr data dataset distillation distribution domain expand exploit knowledge llm medical privacy protection researchers text text classification training type via

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US