k-means++ clustering Arthur2007TreeDist improves the speed and
accuracy of standard kmeans
clustering
Hartigan1979TreeDist by preferring initial cluster centres
that are far from others.
A scalable version of the algorithm has been proposed for larger data sets
Bahmani2012TreeDist, but is not implemented here.
Arguments
- x
Numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).
- k
Integer specifying the number of clusters, k.
- nstart
Positive integer specifying how many random sets should be chosen
- ...
additional arguments passed to
kmeans
See also
Other cluster functions:
cluster-statistics
Examples
# Generate random points
set.seed(1)
x <- cbind(c(rnorm(10, -5), rnorm(5, 1), rnorm(10, 6)),
c(rnorm(5, 0), rnorm(15, 4), rnorm(5, 0)))
# Conventional k-means may perform poorly
klusters <- kmeans(x, cent = 5)
plot(x, col = klusters$cluster, pch = rep(15:19, each = 5))
# Here, k-means++ recovers a better clustering
plusters <- KMeansPP(x, k = 5)
plot(x, col = plusters$cluster, pch = rep(15:19, each = 5))