Skip to contents

k-means++ clustering Arthur2007TreeDist improves the speed and accuracy of standard kmeans clustering Hartigan1979TreeDist by preferring initial cluster centres that are far from others. A scalable version of the algorithm has been proposed for larger data sets Bahmani2012TreeDist, but is not implemented here.

Usage

KMeansPP(x, k = 2, nstart = 10, ...)

Arguments

x

Numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

k

Integer specifying the number of clusters, k.

nstart

Positive integer specifying how many random sets should be chosen

...

additional arguments passed to kmeans

References

See also

kmeans

Other cluster functions: cluster-statistics

Examples

# Generate random points
set.seed(1)
x <- cbind(c(rnorm(10, -5), rnorm(5, 1), rnorm(10, 6)),
           c(rnorm(5, 0), rnorm(15, 4), rnorm(5, 0)))

# Conventional k-means may perform poorly
klusters <- kmeans(x, cent = 5)
plot(x, col = klusters$cluster, pch = rep(15:19, each = 5))


# Here, k-means++ recovers a better clustering
plusters <- KMeansPP(x, k = 5)
plot(x, col = plusters$cluster, pch = rep(15:19, each = 5))