Table of Contents
Fetching ...

Convergence Analysis of Blurring Mean Shift

Ryoya Yamasaki, Toshiyuki Tanaka

TL;DR

This paper investigates convergence properties of the Blurring Mean Shift (BMS) algorithm by interpreting it as a configuration-space optimization problem for the objective $L({\bm u})=\sum_{i,j}K\bigl(\frac{{\bm u_i}-{\bm u_j}}{h}\bigr)$. It establishes convergence guarantees even when blurred data sequences converge to multiple points (clusters) and derives rate bounds under both smooth and non-smooth kernel regimes, leveraging the Łojasiewicz framework and a graph-theoretic representation of interactions. It shows that, for smoothly truncated kernels (e.g., biweight, triweight) and certain non-smoothly truncated kernels (e.g., Epanechnikov, cosine), the configuration sequence $({\bm y_t})$ converges to stationary points, often with exponential or cubic rates once the BMS graph becomes closed; finite-time convergence is proved in particular for Epanechnikov. The results indicate that BMS can achieve clustering more efficiently than standard MS in many settings, with a rigorous link between kernel properties, graph structure, and convergence behavior.

Abstract

Blurring mean shift (BMS) algorithm, a variant of the mean shift algorithm, is a kernel-based iterative method for data clustering, where data points are clustered according to their convergent points via iterative blurring. In this paper, we analyze convergence properties of the BMS algorithm by leveraging its interpretation as an optimization procedure, which is known but has been underutilized in existing convergence studies. Whereas existing results on convergence properties applicable to multi-dimensional data only cover the case where all the blurred data point sequences converge to a single point, this study provides a convergence guarantee even when those sequences can converge to multiple points, yielding multiple clusters. This study also shows that the convergence of the BMS algorithm is fast by further leveraging geometrical characterization of the convergent points.

Convergence Analysis of Blurring Mean Shift

TL;DR

This paper investigates convergence properties of the Blurring Mean Shift (BMS) algorithm by interpreting it as a configuration-space optimization problem for the objective $L({\bm u})=\sum_{i,j}K\bigl(\frac{{\bm u_i}-{\bm u_j}}{h}\bigr)$. It establishes convergence guarantees even when blurred data sequences converge to multiple points (clusters) and derives rate bounds under both smooth and non-smooth kernel regimes, leveraging the Łojasiewicz framework and a graph-theoretic representation of interactions. It shows that, for smoothly truncated kernels (e.g., biweight, triweight) and certain non-smoothly truncated kernels (e.g., Epanechnikov, cosine), the configuration sequence $({\bm y_t})$ converges to stationary points, often with exponential or cubic rates once the BMS graph becomes closed; finite-time convergence is proved in particular for Epanechnikov. The results indicate that BMS can achieve clustering more efficiently than standard MS in many settings, with a rigorous link between kernel properties, graph structure, and convergence behavior.

Abstract

Blurring mean shift (BMS) algorithm, a variant of the mean shift algorithm, is a kernel-based iterative method for data clustering, where data points are clustered according to their convergent points via iterative blurring. In this paper, we analyze convergence properties of the BMS algorithm by leveraging its interpretation as an optimization procedure, which is known but has been underutilized in existing convergence studies. Whereas existing results on convergence properties applicable to multi-dimensional data only cover the case where all the blurred data point sequences converge to a single point, this study provides a convergence guarantee even when those sequences can converge to multiple points, yielding multiple clusters. This study also shows that the convergence of the BMS algorithm is fast by further leveraging geometrical characterization of the convergent points.
Paper Structure (33 sections, 25 theorems, 96 equations, 4 figures, 1 table)

This paper contains 33 sections, 25 theorems, 96 equations, 4 figures, 1 table.

Key Result

Proposition 1

Assume Assumption asm:RS. Then, for the profile $k$ of the kernel $K$ and the function $g$ in eq:funcG, one has

Figures (4)

  • Figure 1: Illustration of the BMS graphs. Each dot represents ${\bm{u}}_i\in{\mathbb{R}}^2$, and each circle represents the outer edge of the ball $\{{\bm{v}}\in{\mathbb{R}}^2\mid G(\frac{{\bm{u}}_i-{\bm{v}}}{h})>0\}$, with $i\in[3]$. For example, the BMS graph ${\mathcal{G}}_{\bm{u}}$ of ${\bm{u}}=({\bm{u}}_i)_{i\in[3]}$ in (\ref{['ii']}) is such that $2$ and $3$ are joined and that $1$ is isolated. Also, ${\mathcal{G}}_{\bm{u}}$ is 'singular' in (\ref{['i']}) and (\ref{['iv']}), 'closed and non-singular' in (\ref{['ii']}), 'open' in (\ref{['iii']}), stable in (\ref{['i']})--(\ref{['iii']}), and unstable in (\ref{['iv']}).
  • Figure 2: Inclusion relation among important function classes relevant to the discussion on the Łojasiewicz property.
  • Figure S1: Data clustering results by the BMS algorithm for six datasets in https://scikit-learn.org/stable/modules/clustering.html. Each dataset has $n=500$ data points in ${\mathbb{R}}^2$ (i.e., $d=2$) and is standardized for use. We used the double-precision floating-point number format, the truncated-flat function $G({\bm{u}})\propto{\mathbbm{1}}(\|{\bm{u}}\|\le1)$ and truncated-quadratic function $G({\bm{u}})\propto(1-\|{\bm{u}}\|^2)_+$, and selected the bandwidth $h$, which yielded the smallest number of clusters without clustering data from clearly different underlying clusters into the same cluster, among the candidates $\{0.03, 0.06, \ldots, 2.97, 3.0\}$ (note that clusters with close ${\bm{y}}_{T,i}$'s can be further integrated with proper post-processing). Each plate shows $h$, terminated step $T=\min\{t\in{\mathbb{N}}\mid{\bm{y}}_{t,i}={\bm{y}}_{t+1,i}, \forall i\in[n]\}$, and number $M$ of clusters. It also shows initial points ${\bm{y}}_{1,i}$'s (data points ${\bm{x}}_i$'s) by large dots, intermediate points ${\bm{y}}_{t,i}$'s by small dots, terminated points ${\bm{y}}_{T,i}$'s by crosses, and trajectories by polylines, in different colors corresponding to clusters.
  • Figure S2: Behavior of the sequence $(r_t)_{t\in[10]}$ under the setting discussed in Section \ref{['sec:SIMP']} with the Gaussian kernel $K$, $n=2$, $h=1$, $r_1=0.99$, computed with 5000-digit precision. The black solid curve shows the simulation result, and the red dotted line shows the predicted slope $\log 3$ of the cubic convergence.

Theorems & Definitions (50)

  • Proposition 1: rockafellar1997convex
  • Definition 1: Truncated kernel and related notions
  • Definition 2: Configuration space
  • Definition 3: BMS graph
  • Definition 4: Characterization of the BMS graph
  • Proposition 2
  • Proposition 3: Upper bound of the number of components of the BMS graph
  • Theorem 1: Convergence guarantee and rate bound for non-truncated kernels or a large bandwidth; cheng1995mean
  • Theorem 2: Conditional convergence guarantee and rate bound; extension of Theorem \ref{['thm:Cheng']}
  • Proposition 4: Invariance of the objective function
  • ...and 40 more