Feb. 7, 2024, 5:43 a.m. | Abdoulaye SakhoLPSM Erwan ScornetLPSM Emmanuel Malherbe

cs.LG updates on arXiv.org arxiv.org

Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced data sets. Asymptotically, we prove that SMOTE (with default parameter) regenerates the original distribution by simply copying the original minority samples. We also prove that SMOTE density vanishes near the boundary of the support of the minority distribution, therefore justifying the common BorderLine SMOTE strategy. Then we introduce two new SMOTE-related strategies, and compare them with state-of-the-art rebalancing procedures. We show that rebalancing strategies are only required …

cs.lg data data sets distribution experimental limitations near oversampling samples smote stat.ml strategies strategy study support synthetic

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

RL Analytics - Content, Data Science Manager

@ Meta | Burlingame, CA

Research Engineer

@ BASF | Houston, TX, US, 77079