April 1, 2024, 1:57 p.m. | /u/bernful

Data Science www.reddit.com

I am attempting to cluster stores based off their sales. I can either do:



1. Univariate K-Means clustering by way of the Ckmeans.1d.dp package in R. This works perfectly fine, only 2 cons are figuring out the upper limit on K, and possibly explainability to the client.
2. Fixed cluster boundaries. In this case, I average the sales of all stores, and create boundaries like: 50% below average, 25% below average, 25% above average, 50% above average. This is …

client cluster clustering cons datascience explainability k-means package sales stores

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Machine Learning Engineer

@ Samsara | Canada - Remote