Table of Contents
Fetching ...

Dynamic Deep Graph Learning for Incomplete Multi-View Clustering with Masked Graph Reconstruction Loss

Zhenghao Zhang, Jun Xie, Xingchen Chen, Tao Yu, Hongzhu Yi, Kaixin Xu, Yuanxiang Wang, Tianyu Zong, Xinming Wang, Jiahuan Chen, Guoqing Chao, Feng Chen, Zhepeng Wang, Jungang Xu

TL;DR

This work tackles incomplete multi-view clustering by replacing static graphs with dynamic, robust graphs learned during training. It integrates a GCN-based embedding layer to impute missing views and generate refined view-specific graphs, a GAT encoder to learn edge-aware representations, and a masked graph reconstruction loss to reduce gradient noise during optimization. Graph-structure contrastive learning aligns inter-view graphs, while a pseudo-label–driven clustering module guides self-supervised refinement. Across four datasets and varying missing rates, DGIMVCM demonstrates superior clustering performance and robustness, validated by ablations and loss-function comparisons that highlight the contribution of each component.

Abstract

The prevalence of real-world multi-view data makes incomplete multi-view clustering (IMVC) a crucial research. The rapid development of Graph Neural Networks (GNNs) has established them as one of the mainstream approaches for multi-view clustering. Despite significant progress in GNNs-based IMVC, some challenges remain: (1) Most methods rely on the K-Nearest Neighbors (KNN) algorithm to construct static graphs from raw data, which introduces noise and diminishes the robustness of the graph topology. (2) Existing methods typically utilize the Mean Squared Error (MSE) loss between the reconstructed graph and the sparse adjacency graph directly as the graph reconstruction loss, leading to substantial gradient noise during optimization. To address these issues, we propose a novel \textbf{D}ynamic Deep \textbf{G}raph Learning for \textbf{I}ncomplete \textbf{M}ulti-\textbf{V}iew \textbf{C}lustering with \textbf{M}asked Graph Reconstruction Loss (DGIMVCM). Firstly, we construct a missing-robust global graph from the raw data. A graph convolutional embedding layer is then designed to extract primary features and refined dynamic view-specific graph structures, leveraging the global graph for imputation of missing views. This process is complemented by graph structure contrastive learning, which identifies consistency among view-specific graph structures. Secondly, a graph self-attention encoder is introduced to extract high-level representations based on the imputed primary features and view-specific graphs, and is optimized with a masked graph reconstruction loss to mitigate gradient noise during optimization. Finally, a clustering module is constructed and optimized through a pseudo-label self-supervised training mechanism. Extensive experiments on multiple datasets validate the effectiveness and superiority of DGIMVCM.

Dynamic Deep Graph Learning for Incomplete Multi-View Clustering with Masked Graph Reconstruction Loss

TL;DR

This work tackles incomplete multi-view clustering by replacing static graphs with dynamic, robust graphs learned during training. It integrates a GCN-based embedding layer to impute missing views and generate refined view-specific graphs, a GAT encoder to learn edge-aware representations, and a masked graph reconstruction loss to reduce gradient noise during optimization. Graph-structure contrastive learning aligns inter-view graphs, while a pseudo-label–driven clustering module guides self-supervised refinement. Across four datasets and varying missing rates, DGIMVCM demonstrates superior clustering performance and robustness, validated by ablations and loss-function comparisons that highlight the contribution of each component.

Abstract

The prevalence of real-world multi-view data makes incomplete multi-view clustering (IMVC) a crucial research. The rapid development of Graph Neural Networks (GNNs) has established them as one of the mainstream approaches for multi-view clustering. Despite significant progress in GNNs-based IMVC, some challenges remain: (1) Most methods rely on the K-Nearest Neighbors (KNN) algorithm to construct static graphs from raw data, which introduces noise and diminishes the robustness of the graph topology. (2) Existing methods typically utilize the Mean Squared Error (MSE) loss between the reconstructed graph and the sparse adjacency graph directly as the graph reconstruction loss, leading to substantial gradient noise during optimization. To address these issues, we propose a novel \textbf{D}ynamic Deep \textbf{G}raph Learning for \textbf{I}ncomplete \textbf{M}ulti-\textbf{V}iew \textbf{C}lustering with \textbf{M}asked Graph Reconstruction Loss (DGIMVCM). Firstly, we construct a missing-robust global graph from the raw data. A graph convolutional embedding layer is then designed to extract primary features and refined dynamic view-specific graph structures, leveraging the global graph for imputation of missing views. This process is complemented by graph structure contrastive learning, which identifies consistency among view-specific graph structures. Secondly, a graph self-attention encoder is introduced to extract high-level representations based on the imputed primary features and view-specific graphs, and is optimized with a masked graph reconstruction loss to mitigate gradient noise during optimization. Finally, a clustering module is constructed and optimized through a pseudo-label self-supervised training mechanism. Extensive experiments on multiple datasets validate the effectiveness and superiority of DGIMVCM.

Paper Structure

This paper contains 39 sections, 26 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: The overview of the DGIMVCM framework. The framework comprises four components: global graph fusion, embedding layer, graph self-attention encoder, and the clustering module. Initially, a global graph $\overline{A}$ is constructed by fusing view-specific similarity matrices with missing sample edge pruning. Subsequently, a GCN-based embedding layer utilizes $\overline{A}$ to extract imputed primary features $\{Z^v\}_{v=1}^V$ and view-specific graphs $\{A^v\}_{v=1}^V$ with missing structure refinement by the global graph, which is optimized via view-specific graph structure consistency contrastive learning. Next, a graph self-attention encoder is employed to extract high-level features $\{H^v\}_{v=1}^V$, optimized by a masked graph reconstruction loss that focuses only on the $K$ strongest edges. Finally, a clustering module performs self-supervised training using pseudo-labels and obtain the final clustering results.
  • Figure 2: Accuracy on four datasets with different missing rates.
  • Figure 3: Parameter sensitivity analysis for $\alpha$ and $\beta$ on Landuse-21 with missing rate of 0.5.
  • Figure 4: NMI on four datasets with different missing rates. (a) HW, (b) Scene-15, (c) 100Leaves, (d) Landuse-21.
  • Figure 5: ARI on four datasets with different missing rates. (a) HW, (b) Scene-15, (c) 100Leaves, (d) Landuse-21.
  • ...and 3 more figures