Accelerated decomposition of bistochastic kernel matrices by low rank approximation
Chris Vales, Dimitrios Giannakis
TL;DR
This work tackles the computational bottleneck of obtaining the eigen-decomposition of bistochastic kernel matrices for large datasets. It introduces a rank-$r$ pivoted partial Cholesky-based strategy to form a low-rank approximation $ ilde{K}=F F^ op$ and then computes the approximate eigenpairs of the bistochastic matrix $ ilde{P}$ with cost $O(N r^2)$, requiring only $N(r+1)$ kernel evaluations. Two acceleration schemes are developed and compared: dilution, which leverages the full dataset information via a sequence of small $r imes r$ factorizations, and subsampling with Nyström extension, which is more parallelizable but incurs higher asymptotic cost. The methods are applied to kernel-based spatiotemporal pattern extraction in chaotic Kuramoto-Sivashinsky dynamics, demonstrating close agreement with true eigenfunctions and highlighting practical trade-offs between accuracy and scalability. Overall, the proposed approach expands the applicability of bistochastic kernel methods to large-scale problems, enabling efficient diffusion-map–style analyses and kernel spectral clustering on big data.
Abstract
We develop an accelerated algorithm for computing an approximate eigenvalue decomposition of bistochastic normalized kernel matrices. Our approach constructs a low rank approximation of the original kernel matrix by the pivoted partial Cholesky algorithm and uses it to compute an approximate decomposition of its bistochastic normalization without requiring the formation of the full kernel matrix. The cost of the proposed algorithm depends linearly on the size of the employed training dataset and quadratically on the rank of the low rank approximation, offering a significant cost reduction compared to the naive approach. We apply the proposed algorithm to the kernel based extraction of spatiotemporal patterns from chaotic dynamics, demonstrating its accuracy while also comparing it with an alternative algorithm consisting of subsampling and Nystroem extension.
