Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering
Huaming Ling, Chenglong Bao, Jiebo Song, Zuoqiang Shi
TL;DR
The paper tackles the scalability challenge of multi-view subspace clustering by introducing FSSMSC, a unified framework that learns a consensus anchor graph and landmarks’ low-dimensional representations in tandem. By jointly optimizing anchor graph construction and label propagation within a landmark-based, multi-view setting, and solving via an ADMM-like alternating algorithm with convergence guarantees, the approach achieves linear complexity with respect to data size. Empirical results across seven benchmark datasets show that FSSMSC outperforms scalable baselines in clustering accuracy while drastically reducing running times and memory usage, especially on large-scale data. The work demonstrates practical impact for large multi-view analytics, enabling efficient semi-supervised clustering where full affinity matrices are prohibitive.
Abstract
In this paper, we introduce a Fast and Scalable Semi-supervised Multi-view Subspace Clustering (FSSMSC) method, a novel solution to the high computational complexity commonly found in existing approaches. FSSMSC features linear computational and space complexity relative to the size of the data. The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks. Unlike traditional methods that manage the anchor graph construction and the label propagation process separately, this paper proposes a unified optimization model that facilitates simultaneous learning of both. An effective alternating update algorithm with convergence guarantees is proposed to solve the unified optimization model. Additionally, the method employs the obtained anchor graph and landmarks' low-dimensional representations to deduce low-dimensional representations for raw data. Following this, a straightforward clustering approach is conducted on these low-dimensional representations to achieve the final clustering results. The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.
