all AI news
Settling Time vs. Accuracy Tradeoffs for Clustering Big Data
April 3, 2024, 4:41 a.m. | Andrew Draganov, David Saulpic, Chris Schwiegelshohn
cs.LG updates on arXiv.org arxiv.org
Abstract: We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly compress the data and perform the clustering on the compressed representation. Unfortunately, there is no universal best choice for compressing the number of points - while random sampling runs in sublinear time and coresets provide theoretical guarantees, …
abstract accuracy arxiv big big data clustering cs.ds cs.lg data dataset datasets k-means large datasets practical study type
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Machine Learning Engineer - Sr. Consultant level
@ Visa | Bellevue, WA, United States