April 8, 2024, 4:45 a.m. | Katharine M. Clark, Paul D. McNicholas

stat.ML updates on arXiv.org arxiv.org

arXiv:1907.01136v5 Announce Type: replace-cross
Abstract: Clustering, or unsupervised classification, is a task often plagued by outliers. Yet there is a paucity of work on handling outliers in clustering. Outlier identification algorithms tend to fall into three broad categories: outlier inclusion, outlier trimming, and \textit{post hoc} outlier identification methods, with the former two often requiring pre-specification of the number of outliers. The fact that sample Mahalanobis distance is beta-distributed is used to derive an approximate distribution for the log-likelihoods of subset …

abstract algorithms arxiv classification clustering identification inclusion outlier outliers stat.me stat.ml type unsupervised work

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Associate Data Engineer

@ Nominet | Oxford/ Hybrid, GB

Data Science Senior Associate

@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India