Feb. 15, 2024, 5:41 a.m. | Won-Seok Choi, Hyundo Lee, Dong-Sig Han, Junseok Park, Heeyeon Koo, Byoung-Tak Zhang

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.08963v1 Announce Type: new
Abstract: Recent machine learning algorithms have been developed using well-curated datasets, which often require substantial cost and resources. On the other hand, the direct use of raw data often leads to overfitting towards frequently occurring class information. To address class imbalances cost-efficiently, we propose an active data filtering process during self-supervised pre-training in our novel framework, Duplicate Elimination (DUEL). This framework integrates an active memory inspired by human working memory and introduces distinctiveness information, which measures …

abstract algorithms arxiv class cost cs.ai cs.lg data datasets duplicate information leads machine machine learning machine learning algorithms memory overfitting raw resources type

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Business Data Analyst

@ Alstom | Johannesburg, GT, ZA