Dual Consistent Constraint via Disentangled Consistency and Complementarity for Multi-view Clustering
Bo Li, Jing Yun
TL;DR
This work tackles multi-view clustering by addressing both shared semantics (consistency) and view-specific information (complementarity). It introduces DCCMVC, a disentangled variational autoencoder that splits latent representations into a shared $Z_s$ and private $Z_p$ component, using KL-regularized priors and a Gumbel-Softmax for consistency. Dual constraints are imposed: within-view and cross-view reconstructions to leverage private and shared information, plus a contrastive learning objective that maximizes mutual information between views in the latent space. Experiments on eight diverse datasets show state-of-the-art clustering performance and robust ablations, highlighting the value of explicitly modeling both consistency and complementarity for scalable, interpretable multi-view clustering.
Abstract
Multi-view clustering can explore common semantics from multiple views and has received increasing attention in recent years. However, current methods focus on learning consistency in representation, neglecting the contribution of each view's complementarity aspect in representation learning. This limit poses a significant challenge in multi-view representation learning. This paper proposes a novel multi-view clustering framework that introduces a disentangled variational autoencoder that separates multi-view into shared and private information, i.e., consistency and complementarity information. We first learn informative and consistent representations by maximizing mutual information across different views through contrastive learning. This process will ignore complementary information. Then, we employ consistency inference constraints to explicitly utilize complementary information when attempting to seek the consistency of shared information across all views. Specifically, we perform a within-reconstruction using the private and shared information of each view and a cross-reconstruction using the shared information of all views. The dual consistency constraints are not only effective in improving the representation quality of data but also easy to extend to other scenarios, especially in complex multi-view scenes. This could be the first attempt to employ dual consistent constraint in a unified MVC theoretical framework. During the training procedure, the consistency and complementarity features are jointly optimized. Extensive experiments show that our method outperforms baseline methods.
