Table of Contents
Fetching ...

A Non-Parametric Bootstrap for Spectral Clustering

Liam Welsh, Phillip Shreeves

TL;DR

The paper tackles overfitting and sub-optimal convergence in finite mixture models by fusing spectral data transformations with non-parametric bootstrap. It introduces two bootstrap-augmented spectral clustering algorithms, Spectral-BootEM and BootSpectral, plus a tailored convergence criterion to yield more robust and consistent solutions than existing bootstrapped EM methods, with substantial speed advantages in high-dimensional settings. Across extensive simulations (mirror and cross-over data) and a real Raman spectroscopy dataset, the proposed methods reduce overfitting, provide reliable out-of-bag membership estimates, and maintain competitive likelihoods, demonstrating practical utility for high-dimensional clustering and probabilistic mixture modeling. The work advances scalable, parameter-efficient clustering in complex data, offering clear trade-offs between computational cost and exploration of the latent space, and providing tools that improve interpretability through stable, probabilistic cluster memberships.

Abstract

Finite mixture modelling is a popular method in the field of clustering and is beneficial largely due to its soft cluster membership probabilities. A common method for fitting finite mixture models is to employ spectral clustering, which can utilize the expectation-maximization (EM) algorithm. However, the EM algorithm falls victim to a number of issues, including convergence to sub-optimal solutions. We address this issue by developing two novel algorithms that incorporate the spectral decomposition of the data matrix and a non-parametric bootstrap sampling scheme. Simulations display the validity of our algorithms and demonstrate not only their flexibility, but also their computational efficiency and ability to avoid poor solutions when compared to other clustering algorithms for estimating finite mixture models. Our techniques are more consistent in their convergence when compared to other bootstrapped algorithms that fit finite mixture models.

A Non-Parametric Bootstrap for Spectral Clustering

TL;DR

The paper tackles overfitting and sub-optimal convergence in finite mixture models by fusing spectral data transformations with non-parametric bootstrap. It introduces two bootstrap-augmented spectral clustering algorithms, Spectral-BootEM and BootSpectral, plus a tailored convergence criterion to yield more robust and consistent solutions than existing bootstrapped EM methods, with substantial speed advantages in high-dimensional settings. Across extensive simulations (mirror and cross-over data) and a real Raman spectroscopy dataset, the proposed methods reduce overfitting, provide reliable out-of-bag membership estimates, and maintain competitive likelihoods, demonstrating practical utility for high-dimensional clustering and probabilistic mixture modeling. The work advances scalable, parameter-efficient clustering in complex data, offering clear trade-offs between computational cost and exploration of the latent space, and providing tools that improve interpretability through stable, probabilistic cluster memberships.

Abstract

Finite mixture modelling is a popular method in the field of clustering and is beneficial largely due to its soft cluster membership probabilities. A common method for fitting finite mixture models is to employ spectral clustering, which can utilize the expectation-maximization (EM) algorithm. However, the EM algorithm falls victim to a number of issues, including convergence to sub-optimal solutions. We address this issue by developing two novel algorithms that incorporate the spectral decomposition of the data matrix and a non-parametric bootstrap sampling scheme. Simulations display the validity of our algorithms and demonstrate not only their flexibility, but also their computational efficiency and ability to avoid poor solutions when compared to other clustering algorithms for estimating finite mixture models. Our techniques are more consistent in their convergence when compared to other bootstrapped algorithms that fit finite mixture models.
Paper Structure (21 sections, 13 equations, 4 figures, 4 tables, 4 algorithms)

This paper contains 21 sections, 13 equations, 4 figures, 4 tables, 4 algorithms.

Figures (4)

  • Figure 1: Original (left) and transformed (right) mirror data.
  • Figure 2: Cross-over data prior to spectral transformation.
  • Figure 3: Convergence of bootstrapped spectral algorithms with varying $\epsilon_B$.
  • Figure 4: A demonstration of observations in the Raman spectroscopy data set. Observations are coloured in grey, whereas the group means are displayed in red, green, and blue.