April 9, 2024, 4:42 a.m. | Artem Vysogorets, Kartik Ahuja, Julia Kempe

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.05579v1 Announce Type: new
Abstract: In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by removing redundant or uninformative samples from the dataset, which yields faster convergence and improved neural scaling laws. However, little is known about its impact on classification bias of the trained models. We conduct the first systematic study of this effect and reveal that existing data pruning …

abstract arxiv bias convergence costs cs.cv cs.lg data data pruning dataset deep learning faster however laws pruning robust samples scaling solution training training data type

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Tableau/PowerBI Developer (A.Con)

@ KPMG India | Bengaluru, Karnataka, India

Software Engineer, Backend - Data Platform (Big Data Infra)

@ Benchling | San Francisco, CA