Convergence Analysis of Blurring Mean Shift

Ryoya Yamasaki; Toshiyuki Tanaka

Convergence Analysis of Blurring Mean Shift

Ryoya Yamasaki, Toshiyuki Tanaka

TL;DR

This paper investigates convergence properties of the Blurring Mean Shift (BMS) algorithm by interpreting it as a configuration-space optimization problem for the objective $L({\bm u})=\sum_{i,j}K\bigl(\frac{{\bm u_i}-{\bm u_j}}{h}\bigr)$. It establishes convergence guarantees even when blurred data sequences converge to multiple points (clusters) and derives rate bounds under both smooth and non-smooth kernel regimes, leveraging the Łojasiewicz framework and a graph-theoretic representation of interactions. It shows that, for smoothly truncated kernels (e.g., biweight, triweight) and certain non-smoothly truncated kernels (e.g., Epanechnikov, cosine), the configuration sequence $({\bm y_t})$ converges to stationary points, often with exponential or cubic rates once the BMS graph becomes closed; finite-time convergence is proved in particular for Epanechnikov. The results indicate that BMS can achieve clustering more efficiently than standard MS in many settings, with a rigorous link between kernel properties, graph structure, and convergence behavior.

Abstract

Blurring mean shift (BMS) algorithm, a variant of the mean shift algorithm, is a kernel-based iterative method for data clustering, where data points are clustered according to their convergent points via iterative blurring. In this paper, we analyze convergence properties of the BMS algorithm by leveraging its interpretation as an optimization procedure, which is known but has been underutilized in existing convergence studies. Whereas existing results on convergence properties applicable to multi-dimensional data only cover the case where all the blurred data point sequences converge to a single point, this study provides a convergence guarantee even when those sequences can converge to multiple points, yielding multiple clusters. This study also shows that the convergence of the BMS algorithm is fast by further leveraging geometrical characterization of the convergent points.

Convergence Analysis of Blurring Mean Shift

TL;DR

Abstract

Paper Structure (33 sections, 25 theorems, 96 equations, 4 figures, 1 table)

This paper contains 33 sections, 25 theorems, 96 equations, 4 figures, 1 table.

Introduction
Preparation
Basic Assumptions on Kernel
Graph-Theoretic Representation
Review of Previous Work
Convergence Guarantee to a Single Point
Conditional Convergence Guarantee
One-Dimensional Convergence Guarantee
Cubic Convergence of Gaussian Population
Optimization View in Configuration Space
Convergence Analysis
Properties of Objective Function
Convergence Guarantee of Objective Sequence
Minorize-Maximize Algorithm
Convergence Guarantee
...and 18 more sections

Key Result

Proposition 1

Assume Assumption asm:RS. Then, for the profile $k$ of the kernel $K$ and the function $g$ in eq:funcG, one has

Figures (4)

Figure 1: Illustration of the BMS graphs. Each dot represents ${\bm{u}}_i\in{\mathbb{R}}^2$, and each circle represents the outer edge of the ball $\{{\bm{v}}\in{\mathbb{R}}^2\mid G(\frac{{\bm{u}}_i-{\bm{v}}}{h})>0\}$, with $i\in[3]$. For example, the BMS graph ${\mathcal{G}}_{\bm{u}}$ of ${\bm{u}}=({\bm{u}}_i)_{i\in[3]}$ in (\ref{['ii']}) is such that $2$ and $3$ are joined and that $1$ is isolated. Also, ${\mathcal{G}}_{\bm{u}}$ is 'singular' in (\ref{['i']}) and (\ref{['iv']}), 'closed and non-singular' in (\ref{['ii']}), 'open' in (\ref{['iii']}), stable in (\ref{['i']})--(\ref{['iii']}), and unstable in (\ref{['iv']}).
Figure 2: Inclusion relation among important function classes relevant to the discussion on the Łojasiewicz property.
Figure S1: Data clustering results by the BMS algorithm for six datasets in https://scikit-learn.org/stable/modules/clustering.html. Each dataset has $n=500$ data points in ${\mathbb{R}}^2$ (i.e., $d=2$) and is standardized for use. We used the double-precision floating-point number format, the truncated-flat function $G({\bm{u}})\propto{\mathbbm{1}}(\|{\bm{u}}\|\le1)$ and truncated-quadratic function $G({\bm{u}})\propto(1-\|{\bm{u}}\|^2)_+$, and selected the bandwidth $h$, which yielded the smallest number of clusters without clustering data from clearly different underlying clusters into the same cluster, among the candidates $\{0.03, 0.06, \ldots, 2.97, 3.0\}$ (note that clusters with close ${\bm{y}}_{T,i}$'s can be further integrated with proper post-processing). Each plate shows $h$, terminated step $T=\min\{t\in{\mathbb{N}}\mid{\bm{y}}_{t,i}={\bm{y}}_{t+1,i}, \forall i\in[n]\}$, and number $M$ of clusters. It also shows initial points ${\bm{y}}_{1,i}$'s (data points ${\bm{x}}_i$'s) by large dots, intermediate points ${\bm{y}}_{t,i}$'s by small dots, terminated points ${\bm{y}}_{T,i}$'s by crosses, and trajectories by polylines, in different colors corresponding to clusters.
Figure S2: Behavior of the sequence $(r_t)_{t\in[10]}$ under the setting discussed in Section \ref{['sec:SIMP']} with the Gaussian kernel $K$, $n=2$, $h=1$, $r_1=0.99$, computed with 5000-digit precision. The black solid curve shows the simulation result, and the red dotted line shows the predicted slope $\log 3$ of the cubic convergence.

Theorems & Definitions (50)

Proposition 1: rockafellar1997convex
Definition 1: Truncated kernel and related notions
Definition 2: Configuration space
Definition 3: BMS graph
Definition 4: Characterization of the BMS graph
Proposition 2
Proposition 3: Upper bound of the number of components of the BMS graph
Theorem 1: Convergence guarantee and rate bound for non-truncated kernels or a large bandwidth; cheng1995mean
Theorem 2: Conditional convergence guarantee and rate bound; extension of Theorem \ref{['thm:Cheng']}
Proposition 4: Invariance of the objective function
...and 40 more

Convergence Analysis of Blurring Mean Shift

TL;DR

Abstract

Convergence Analysis of Blurring Mean Shift

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (50)