April 3, 2024, 4:41 a.m. | Andrew Draganov, David Saulpic, Chris Schwiegelshohn

cs.LG updates on arXiv.org arxiv.org

arXiv:2404.01936v1 Announce Type: new
Abstract: We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly compress the data and perform the clustering on the compressed representation. Unfortunately, there is no universal best choice for compressing the number of points - while random sampling runs in sublinear time and coresets provide theoretical guarantees, …

abstract accuracy arxiv big big data clustering cs.ds cs.lg data dataset datasets k-means large datasets practical study type

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Machine Learning Engineer - Sr. Consultant level

@ Visa | Bellevue, WA, United States