Web: http://arxiv.org/abs/2209.10042

Sept. 22, 2022, 1:11 a.m. | Hyeon Jeon, Michael Aupetit, DongHwa Shin, Aeri Cho, Seokhyeon Park, Jinwook Seo

cs.LG updates on arXiv.org arxiv.org

We address the lack of reliability in benchmarking clustering techniques
based on labeled datasets. A standard scheme in external clustering validation
is to use class labels as ground truth clusters, based on the assumption that
each class forms a single, clearly separated cluster. However, as such
cluster-label matching (CLM) assumption often breaks, the lack of conducting a
sanity check for the CLM of benchmark datasets casts doubt on the validity of
external validations. Still, evaluating the degree of CLM is …

arxiv benchmarks clustering validation

