k-means++ clustering (Arthur and Vassilvitskii 2007)
improves the speed and
accuracy of standard `kmeans`

clustering
(Hartigan and Wong 1979)
by preferring initial cluster centres
that are far from others.
A scalable version of the algorithm has been proposed for larger data sets
(Bahmani et al. 2012)
, but is not implemented here.

## Arguments

- x
Numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

- k
Integer specifying the number of clusters,

*k*.- nstart
Positive integer specifying how many random sets should be chosen

- ...
additional arguments passed to

`kmeans`

## References

Arthur D, Vassilvitskii S (2007).
“K-Means++: The Advantages of Careful Seeding.”
In *Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms*, SODA '07, 1027–1035.

Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012).
“Scalable K-Means++.”
*arXiv*.
doi:10.48550/arXiv.1203.6402
, 1203.6402.

Hartigan JA, Wong MA (1979).
“Algorithm AS 136: a *K*-means clustering algorithm.”
*Journal of the Royal Statistical Society. Series C (Applied Statistics)*, **28**(1), 100–108.
doi:10.2307/2346830
.

## See also

Other cluster functions:
`cluster-statistics`

## Examples

```
# Generate random points
set.seed(1)
x <- cbind(c(rnorm(10, -5), rnorm(5, 1), rnorm(10, 6)),
c(rnorm(5, 0), rnorm(15, 4), rnorm(5, 0)))
# Conventional k-means may perform poorly
klusters <- kmeans(x, cent = 5)
plot(x, col = klusters$cluster, pch = rep(15:19, each = 5))
# Here, k-means++ recovers a better clustering
plusters <- KMeansPP(x, k = 5)
plot(x, col = plusters$cluster, pch = rep(15:19, each = 5))
```