Feb. 12, 2024, 5:43 a.m. | Zikai Xiong Niccol\`o Dalmasso Shubham Sharma Freddy Lecue Daniele Magazzeni Vamsi K. Potluru Tucker B

cs.LG updates on arXiv.org arxiv.org

Data distillation and coresets have emerged as popular approaches to generate a smaller representative set of samples for downstream learning tasks to handle large-scale datasets. At the same time, machine learning is being increasingly applied to decision-making processes at a societal level, making it imperative for modelers to address inherent biases towards subgroups present in the data. Current approaches create fair synthetic representative samples by optimizing local properties relative to the original samples, but their effect on downstream learning processes …

biases cs.cy cs.lg data datasets decision distillation fair generate machine machine learning making popular processes samples scale set stat.ml subgroups tasks transport via

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Sr. Software Development Manager, AWS Neuron Machine Learning Distributed Training

@ Amazon.com | Cupertino, California, USA