A Non-Parametric Bootstrap for Spectral Clustering
Liam Welsh, Phillip Shreeves
TL;DR
The paper tackles overfitting and sub-optimal convergence in finite mixture models by fusing spectral data transformations with non-parametric bootstrap. It introduces two bootstrap-augmented spectral clustering algorithms, Spectral-BootEM and BootSpectral, plus a tailored convergence criterion to yield more robust and consistent solutions than existing bootstrapped EM methods, with substantial speed advantages in high-dimensional settings. Across extensive simulations (mirror and cross-over data) and a real Raman spectroscopy dataset, the proposed methods reduce overfitting, provide reliable out-of-bag membership estimates, and maintain competitive likelihoods, demonstrating practical utility for high-dimensional clustering and probabilistic mixture modeling. The work advances scalable, parameter-efficient clustering in complex data, offering clear trade-offs between computational cost and exploration of the latent space, and providing tools that improve interpretability through stable, probabilistic cluster memberships.
Abstract
Finite mixture modelling is a popular method in the field of clustering and is beneficial largely due to its soft cluster membership probabilities. A common method for fitting finite mixture models is to employ spectral clustering, which can utilize the expectation-maximization (EM) algorithm. However, the EM algorithm falls victim to a number of issues, including convergence to sub-optimal solutions. We address this issue by developing two novel algorithms that incorporate the spectral decomposition of the data matrix and a non-parametric bootstrap sampling scheme. Simulations display the validity of our algorithms and demonstrate not only their flexibility, but also their computational efficiency and ability to avoid poor solutions when compared to other clustering algorithms for estimating finite mixture models. Our techniques are more consistent in their convergence when compared to other bootstrapped algorithms that fit finite mixture models.
