Towards Robust Nonlinear Subspace Clustering: A Kernel Learning Approach
Kunpeng Xu, Lifei Chen, Shengrui Wang
TL;DR
DKLM tackles nonlinear subspace clustering by learning a data-driven kernel $\boldsymbol{\mathcal{K}}$ from self-representation $\mathbf{Z}$ in a RKHS to preserve local manifolds and promote a block-diagonal affinity. It integrates a block-diagonal regularizer $\|\mathbf{Z}\|_{\boxed{k}}$ and a negative trace term $-\mathrm{Tr}(\boldsymbol{\mathcal{K}}\mathbf{Z})$, and solves via alternating updates with a Nyström-based kernel approximation for scalability. The framework establishes connections to kernel k-means and low-rank/self-representation methods through a multiplicative triangle inequality and permutation invariance properties. Empirical results on synthetic, image, text, motion, and time-series data show superior robustness and clustering accuracy compared to state-of-the-art approaches, highlighting the practical impact for real-world nonlinear data analysis.
Abstract
Kernel-based subspace clustering, which addresses the nonlinear structures in data, is an evolving area of research. Despite noteworthy progressions, prevailing methodologies predominantly grapple with limitations relating to (i) the influence of predefined kernels on model performance; (ii) the difficulty of preserving the original manifold structures in the nonlinear space; (iii) the dependency of spectral-type strategies on the ideal block diagonal structure of the affinity matrix. This paper presents DKLM, a novel paradigm for kernel-induced nonlinear subspace clustering. DKLM provides a data-driven approach that directly learns the kernel from the data's self-representation, ensuring adaptive weighting and satisfying the multiplicative triangle inequality constraint, which enhances the robustness of the learned kernel. By leveraging this learned kernel, DKLM preserves the local manifold structure of data in a nonlinear space while promoting the formation of an optimal block-diagonal affinity matrix. A thorough theoretical examination of DKLM reveals its relationship with existing clustering paradigms. Comprehensive experiments on synthetic and real-world datasets demonstrate the effectiveness of the proposed method.
