Table of Contents
Fetching ...

Convergence and clustering analysis for Mean Shift with radially symmetric, positive definite kernels

Susovan Pal

Abstract

The mean shift (MS) is a non-parametric, density-based, iterative algorithm with prominent usage in clustering and image segmentation. A rigorous proof for the convergence of its mode estimate sequence in full generality remains unknown. In this paper, we show that for\textit{ sufficiently large bandwidth} convergence is guaranteed in any dimension with \textit{any radially symmetric and strictly positive definite kernels}. Although the author acknowledges that our result is partially more restrictive than that of \cite{YT} due to the lower limit of the bandwidth, our kernel class is not covered by the kernel class in \cite{YT}, and the proof technique is different. Moreover, we show theoretically and experimentally that while for Gaussian kernel, accurate clustering at \textit{large bandwidths} is generally impossible, it may still be possible for other radially symmetric, strictly positive definite kernels.

Convergence and clustering analysis for Mean Shift with radially symmetric, positive definite kernels

Abstract

The mean shift (MS) is a non-parametric, density-based, iterative algorithm with prominent usage in clustering and image segmentation. A rigorous proof for the convergence of its mode estimate sequence in full generality remains unknown. In this paper, we show that for\textit{ sufficiently large bandwidth} convergence is guaranteed in any dimension with \textit{any radially symmetric and strictly positive definite kernels}. Although the author acknowledges that our result is partially more restrictive than that of \cite{YT} due to the lower limit of the bandwidth, our kernel class is not covered by the kernel class in \cite{YT}, and the proof technique is different. Moreover, we show theoretically and experimentally that while for Gaussian kernel, accurate clustering at \textit{large bandwidths} is generally impossible, it may still be possible for other radially symmetric, strictly positive definite kernels.

Paper Structure

This paper contains 38 sections, 11 theorems, 95 equations, 5 figures.

Key Result

Theorem 1

Consider the MS algorithm with the radially symmetric, strictly positive definite kernel $K(x,y) :=k(\left\lVert x-y\right\rVert^2),$ given by the kernel profile function $k.$ Then there's $h_0 > 0$ depending only on the kernel profile $k$ and its first two derivatives and the maximum norm $\left\l

Figures (5)

  • Figure 1: Mean-shift with the Gaussian kernel converges, but identifies an incorrect number of clusters at the theoretically prescribed bandwidth.
  • Figure 2: Large-bandwidth experiment $h = 10\|x_{\max}\|$: Gaussian collapse versus Laplace and Cauchy-type kernels for synthetic data.
  • Figure 3: Large-bandwidth experiment $h = 10\|x_{\max}\|$ on Iris: Gaussian and Laplace kernel collapse Cauchy-type non-collapse.
  • Figure 4: Mean-shift trajectories in the PCA$(2)$ projection of Wheat Seed data at $H = 10\max_i \|x_i\|$.
  • Figure 5: Grayscale elephant image and its segmentation obtained using the Cauchy-type kernel.

Theorems & Definitions (22)

  • Definition : Positive semidefinite and positive definite square matrices
  • Definition : Positive and strictly positive definite functions
  • Theorem 1
  • Lemma 1
  • Definition : Completely monotone functions
  • Proposition 1: Hausdorff--Bernstein--Widder representation
  • Proposition 2: Schoenberg characterization and strictness criterion
  • Remark 1
  • Proposition 3: Modified Proposition \ref{['prop:HBD thm']}
  • proof
  • ...and 12 more