Table of Contents
Fetching ...

CavMerge: Merging K-means Based on Local Log-Concavity

Zhili Qiao, Wangqian Ju, Peng Liu

Abstract

K-means clustering, a classic and widely-used clustering technique, is known to exhibit suboptimal performance when applied to non-linearly separable data. Numerous adjustments and modifications have been proposed to address this issue, including methods that merge K-means results from a relatively large K to obtain a final cluster assignment. However, existing methods of this nature often encounter computational inefficiencies and suffer from hyperparameter tuning. Here we present \emph{CavMerge}, a novel K-means merging algorithm that is intuitive, free of parameter tuning, and computationally efficient. Operating under minimal local distributional assumptions, our algorithm demonstrates strong consistency and rapid convergence guarantees. Empirical studies on various simulated and real datasets demonstrate that our method yields more reliable clusters in comparison to current state-of-the-art algorithms.

CavMerge: Merging K-means Based on Local Log-Concavity

Abstract

K-means clustering, a classic and widely-used clustering technique, is known to exhibit suboptimal performance when applied to non-linearly separable data. Numerous adjustments and modifications have been proposed to address this issue, including methods that merge K-means results from a relatively large K to obtain a final cluster assignment. However, existing methods of this nature often encounter computational inefficiencies and suffer from hyperparameter tuning. Here we present \emph{CavMerge}, a novel K-means merging algorithm that is intuitive, free of parameter tuning, and computationally efficient. Operating under minimal local distributional assumptions, our algorithm demonstrates strong consistency and rapid convergence guarantees. Empirical studies on various simulated and real datasets demonstrate that our method yields more reliable clusters in comparison to current state-of-the-art algorithms.

Paper Structure

This paper contains 33 sections, 2 theorems, 14 equations, 4 figures, 3 tables.

Key Result

Proposition 1

Under the manifold hypothesis that the data concentrate near a $(p-1)$-dimensional submanifold of $\mathbb{R}^p$, the number of adjacent cluster pairs is $O(K)$ instead of $O(K^2)$. $\blacktriangleleft$$\blacktriangleleft$

Figures (4)

  • Figure 1: 2D illustration of step 3.
  • Figure 2: Fifteen 2D datasets used for performance evaluations.
  • Figure 3: Performance of CavMerge on fifteen 2D datasets.
  • Figure 4: Visualization for: (a) 28 initial K-means clusters; (b) Merging results for CavMerge; (c) Merging results for Skeleton Clustering.

Theorems & Definitions (6)

  • proof
  • Proposition 1: Linear adjacency bound
  • proof : Proof sketch
  • proof
  • Proposition 2
  • proof