Table of Contents
Fetching ...

On uniqueness of the set of k-means

Javier Cárcamo, Antonio Cuevas, Luis A. Rodríguez

Abstract

We provide necessary and sufficient conditions for the uniqueness of the k-means set of a probability distribution. This uniqueness problem is related to the choice of k: depending on the underlying distribution, some values of this parameter could lead to multiple sets of k-means, which hampers the interpretation of the results and/or the stability of the algorithms. We give a general assessment on consistency of the empirical k-means adapted to the setting of non-uniqueness and determine the asymptotic distribution of the within cluster sum of squares (WCSS). We also provide statistical characterizations of k-means uniqueness in terms of the asymptotic behavior of the empirical WCSS. As a consequence, we derive a bootstrap test for uniqueness of the set of k-means. The results are illustrated with examples of different types of non-uniqueness and we check by simulations the performance of the proposed methodology.

On uniqueness of the set of k-means

Abstract

We provide necessary and sufficient conditions for the uniqueness of the k-means set of a probability distribution. This uniqueness problem is related to the choice of k: depending on the underlying distribution, some values of this parameter could lead to multiple sets of k-means, which hampers the interpretation of the results and/or the stability of the algorithms. We give a general assessment on consistency of the empirical k-means adapted to the setting of non-uniqueness and determine the asymptotic distribution of the within cluster sum of squares (WCSS). We also provide statistical characterizations of k-means uniqueness in terms of the asymptotic behavior of the empirical WCSS. As a consequence, we derive a bootstrap test for uniqueness of the set of k-means. The results are illustrated with examples of different types of non-uniqueness and we check by simulations the performance of the proposed methodology.

Paper Structure

This paper contains 16 sections, 38 equations, 14 figures.

Figures (14)

  • Figure 1: Model U$r$C3K2. Densities functions in \ref{['density-uniform']} with the sets of $2$-means (in black, red and green) for $r=0.1$ (upper panel), $r=3\sqrt{2}-4$ (middle panel) and $r=0.4$ (lower panel).
  • Figure 2: Model C1k2. (a) Density plot and (b) contour plot. The two points in black are one of the sets of $2$-means located on a circumference centered at the origin of radius $\sqrt{2/\pi}$.
  • Figure 3: Model C2k3. (a) Density plot and (b) contour plot. The two sets of three points in black and red are two of the infinite sets of $3$-means of the case $\text{CNU}(3)$.
  • Figure 4: Model TC3k2. (a) Density plot and (b) contour plot. The three sets of 2 points in black, red and green are the sets of $2$-means of the case $\text{DNU}(2)$.
  • Figure 5: Models C2k2-2 (left panels) and C2k2-3 (right panels). Density plots in (a) and (c) and contour plots in (b) and (d). The points in black are the (unique) sets of $2$-means.
  • ...and 9 more figures

Theorems & Definitions (14)

  • Example 1: Model U$r$C3k2
  • Example 2: Model C1k2
  • Example 3: Model C2k3
  • Example 4: Model TC3k2, Gaussian triangle
  • Example 5: Models C2k2
  • Example 6: Model C3k3
  • Example 7: Model C3k2
  • proof
  • proof
  • proof
  • ...and 4 more