One for all: A novel Dual-space Co-training baseline for Large-scale Multi-View Clustering
Zisen Kong, Zhiqiang Fu, Dongxia Chang, Yiming Wang, Yao Zhao
TL;DR
This work tackles large-scale multi-view clustering by addressing view heterogeneity with a dual-space co-training framework. DSCMC jointly learns a discriminative anchor graph in the original space via projections $P^v$ and a latent-space transformation via $W^v$, aligned through a latent graph $Z$ and an anchor matrix $A$, with an element-wise strategy to boost robustness. The proposed objective combines complementary and consistent information across spaces, is optimized by alternating updates ensuring nonincreasing objectives, and runs in near-linear time with respect to the number of samples. Empirically, DSCMC outperforms state-of-the-art large-scale MVC methods on nine diverse datasets and demonstrates clear improvements from ablations and novel regularization choices, indicating strong practical impact for scalable multi-view clustering.
Abstract
In this paper, we propose a novel multi-view clustering model, named Dual-space Co-training Large-scale Multi-view Clustering (DSCMC). The main objective of our approach is to enhance the clustering performance by leveraging co-training in two distinct spaces. In the original space, we learn a projection matrix to obtain latent consistent anchor graphs from different views. This process involves capturing the inherent relationships and structures between data points within each view. Concurrently, we employ a feature transformation matrix to map samples from various views to a shared latent space. This transformation facilitates the alignment of information from multiple views, enabling a comprehensive understanding of the underlying data distribution. We jointly optimize the construction of the latent consistent anchor graph and the feature transformation to generate a discriminative anchor graph. This anchor graph effectively captures the essential characteristics of the multi-view data and serves as a reliable basis for subsequent clustering analysis. Moreover, the element-wise method is proposed to avoid the impact of diverse information between different views. Our algorithm has an approximate linear computational complexity, which guarantees its successful application on large-scale datasets. Through experimental validation, we demonstrate that our method significantly reduces computational complexity while yielding superior clustering performance compared to existing approaches.
