Stochastic Mean-Shift Clustering
Itshak Lapidot, Yann Sepulcre, Tom Trigano
TL;DR
This paper introduces Stochastic Mean Shift (SMS), a stochastic, asynchronous variant of mean-shift clustering in which a randomly chosen data point is updated at each iteration via a mean-shift gradient step. The authors establish theoretical properties, including a non-decreasing KDE objective $L$ and eventual clustering with vanishing diameters, and provide practical convergence criteria. Empirically, SMS demonstrates competitive or superior performance to deterministic MS and often matches or beats Blurring Mean Shift (BMS) on synthetic multi-modal data, while offering linear per-update complexity and better scalability to large datasets. The approach is validated on speaker clustering tasks using PLDA-based distances, illustrating practical impact for diarization and other high-dimensional clustering problems. Overall, SMS offers a robust, scalable alternative to classical mean-shift methods with strong convergence behavior and broad applicability.
Abstract
We present a stochastic version of the mean-shift clustering algorithm. In this stochastic version a randomly chosen sequence of data points move according to partial gradient ascent steps of the objective function. Theoretical results illustrating the convergence of the proposed approach, and its relative performances is evaluated on synthesized 2-dimensional samples generated by a Gaussian mixture distribution and compared with state-of-the-art methods. It can be observed that in most cases the stochastic mean-shift clustering outperforms the standard mean-shift. We also illustrate as a practical application the use of the presented method for speaker clustering.
