Feb. 2, 2024, 3:40 p.m. | Atiquer Rahman Sarkar Yao-Shun Chuang Noman Mohammed Xiaoqian Jiang

cs.CL updates on arXiv.org arxiv.org

For sharing privacy-sensitive data, de-identification is commonly regarded as adequate for safeguarding privacy. Synthetic data is also being considered as a privacy-preserving alternative. Recent successes with numerical and tabular data generative models and the breakthroughs in large generative language models raise the question of whether synthetically generated clinical notes could be a viable alternative to real notes for research purposes. In this work, we demonstrated that (i) de-identification of real clinical notes does not protect records against a membership inference …

clinical cs.cl data de-identification generated generative generative models identification language language models notes numerical privacy question raise synthetic synthetic data tabular tabular data

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote