Table of Contents
Fetching ...

Dual Consistent Constraint via Disentangled Consistency and Complementarity for Multi-view Clustering

Bo Li, Jing Yun

TL;DR

This work tackles multi-view clustering by addressing both shared semantics (consistency) and view-specific information (complementarity). It introduces DCCMVC, a disentangled variational autoencoder that splits latent representations into a shared $Z_s$ and private $Z_p$ component, using KL-regularized priors and a Gumbel-Softmax for consistency. Dual constraints are imposed: within-view and cross-view reconstructions to leverage private and shared information, plus a contrastive learning objective that maximizes mutual information between views in the latent space. Experiments on eight diverse datasets show state-of-the-art clustering performance and robust ablations, highlighting the value of explicitly modeling both consistency and complementarity for scalable, interpretable multi-view clustering.

Abstract

Multi-view clustering can explore common semantics from multiple views and has received increasing attention in recent years. However, current methods focus on learning consistency in representation, neglecting the contribution of each view's complementarity aspect in representation learning. This limit poses a significant challenge in multi-view representation learning. This paper proposes a novel multi-view clustering framework that introduces a disentangled variational autoencoder that separates multi-view into shared and private information, i.e., consistency and complementarity information. We first learn informative and consistent representations by maximizing mutual information across different views through contrastive learning. This process will ignore complementary information. Then, we employ consistency inference constraints to explicitly utilize complementary information when attempting to seek the consistency of shared information across all views. Specifically, we perform a within-reconstruction using the private and shared information of each view and a cross-reconstruction using the shared information of all views. The dual consistency constraints are not only effective in improving the representation quality of data but also easy to extend to other scenarios, especially in complex multi-view scenes. This could be the first attempt to employ dual consistent constraint in a unified MVC theoretical framework. During the training procedure, the consistency and complementarity features are jointly optimized. Extensive experiments show that our method outperforms baseline methods.

Dual Consistent Constraint via Disentangled Consistency and Complementarity for Multi-view Clustering

TL;DR

This work tackles multi-view clustering by addressing both shared semantics (consistency) and view-specific information (complementarity). It introduces DCCMVC, a disentangled variational autoencoder that splits latent representations into a shared and private component, using KL-regularized priors and a Gumbel-Softmax for consistency. Dual constraints are imposed: within-view and cross-view reconstructions to leverage private and shared information, plus a contrastive learning objective that maximizes mutual information between views in the latent space. Experiments on eight diverse datasets show state-of-the-art clustering performance and robust ablations, highlighting the value of explicitly modeling both consistency and complementarity for scalable, interpretable multi-view clustering.

Abstract

Multi-view clustering can explore common semantics from multiple views and has received increasing attention in recent years. However, current methods focus on learning consistency in representation, neglecting the contribution of each view's complementarity aspect in representation learning. This limit poses a significant challenge in multi-view representation learning. This paper proposes a novel multi-view clustering framework that introduces a disentangled variational autoencoder that separates multi-view into shared and private information, i.e., consistency and complementarity information. We first learn informative and consistent representations by maximizing mutual information across different views through contrastive learning. This process will ignore complementary information. Then, we employ consistency inference constraints to explicitly utilize complementary information when attempting to seek the consistency of shared information across all views. Specifically, we perform a within-reconstruction using the private and shared information of each view and a cross-reconstruction using the shared information of all views. The dual consistency constraints are not only effective in improving the representation quality of data but also easy to extend to other scenarios, especially in complex multi-view scenes. This could be the first attempt to employ dual consistent constraint in a unified MVC theoretical framework. During the training procedure, the consistency and complementarity features are jointly optimized. Extensive experiments show that our method outperforms baseline methods.

Paper Structure

This paper contains 18 sections, 17 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: Our basic idea. Taking a bi-view data as showcase, in left of figure, we use 2 oval panels to denote 2 views, polygons with different colors and shapes to indicate different categories. In right of figure, the solid and dotted rectangles denote the shared information and the private information. The red dotted rectangles denote ignoring the role of private information in multi-view clustering. The orange brackets indicate the reconstruction of multi-view data. Traditional methods improve clustering performance by pursuing multi-view consistency, which ignoring private information. However, when dual consistency constraints guarantee consistency learning, our method utilizes private information for view reconstruction. The goal of the cross-view consistency constraints is to maximize the mutual information between View 1 and View 2. The goal of the shared information consistency constraint is to express properly View 1 and View 2. Since complementarity information is specific to each view, multi-view representation learning requires explicitly preserving it to guarantee information completion during reconstruction.
  • Figure 2: Overview of the framework. Bi-view data is used as a showcase in this figure. Our method consists of three joint learning objectives, i.e., reconstruction, variational reconstruction, shared information consistency inference constraint, and contrastive learning consistency constraint. Specifically, the goal of reconstruction is to maintain the diversity of views and project all views into latent spaces. Variational reconstruction consists of within-view reconstruction and cross-view reconstruction, which effectively ensure the quality of data representation. The shared information consistency inference objective ensures that the consistency of shared information across views in the latent space and aids in capturing the common features of the sample. Contrastive learning is to maximize the mutual information pursuit consistency among different views. This dual consistent constraint is interacted with and jointly optimized by the VAE network, thus improving the multi-view clustering performance.
  • Figure 3: The structure of variational autoencoder (VAE).
  • Figure 4: The process of view reconstruction.
  • Figure 5: The process of shared information consistency inference. $Z_{p}^{(1)}$ and $Z_{p}^{(2)}$ denotes the private information of view $X^{(1)}$ and view $X^{(2)}$, respectively. $Z_{s}$ denotes the shared information through consistency inference constraint. The $\hat{X}^{(1)}$ and $\hat{X}^{(2)}$ denotes the reconstructed view by the private information of each view and the shared information of all views. We elaborate on the shared information consistency inference constraint in Eq. \ref{['eq10']}.
  • ...and 7 more figures