Spectral clustering algorithm for the allometric extension model
Kohei Kawamoto, Yuichi Goto, Koji Tsukuda
TL;DR
This paper addresses binary clustering under an allometric extension model in which the leading directions of the two covariances and the mean difference are aligned. It derives a non-asymptotic bound on the misclassification probability of a spectral clustering algorithm that uses the top eigenvector of the sample covariance and demonstrates high-dimensional consistency as n and m grow under suitable conditions. The analysis relies on sub-Gaussian data properties, concentration of the sample covariance around its expectation, and eigenvector perturbation bounds to relate the estimated and population eigenvectors, with the signal-to-noise ratio eta = ||mu||^2 / max_j lambda1(Sigma_j) governing performance. The results extend spectral clustering theory beyond homoscedastic assumptions and provide finite-sample guarantees for clustering under heteroscedastic allometric relations.
Abstract
The spectral clustering algorithm is often used as a binary clustering method for unclassified data by applying the principal component analysis. To study theoretical properties of the algorithm, the assumption of conditional homoscedasticity is often supposed in existing studies. However, this assumption is restrictive and often unrealistic in practice. Therefore, in this paper, we consider the allometric extension model, that is, the directions of the first eigenvectors of two covariance matrices and the direction of the difference of two mean vectors coincide, and we provide a non-asymptotic bound of the error probability of the spectral clustering algorithm for the allometric extension model. As a byproduct of the result, we obtain the consistency of the clustering method in high-dimensional settings.
