Table of Contents
Fetching ...

Clustering-Oriented Generative Attribute Graph Imputation

Mulin Chen, Bocheng Wang, Jiaxin Zhong, Zongcheng Miao, Xuelong Li

TL;DR

This work tackles attribute-missing graph clustering by jointly performing clustering-aware imputation and edge-oriented refinement. It introduces CGIR, which first learns subcluster distributions to constrain generative imputation and then applies an Edge Attention Network with contrastive learning to identify edge-wise attributes for reliable graph reconstruction. The approach combines a subcluster-aware generator, a discriminator, and an edge-attentional refinement module, trained with an alternating adversarial objective and a subcluster regularizer. Empirical results on four benchmarks demonstrate robust clustering performance under substantial attribute missingness, with ablations confirming the utility of subcluster modeling and edge-focused refinement for practical unsupervised graph clustering.

Abstract

Attribute-missing graph clustering has emerged as a significant unsupervised task, where only attribute vectors of partial nodes are available and the graph structure is intact. The related models generally follow the two-step paradigm of imputation and refinement. However, most imputation approaches fail to capture class-relevant semantic information, leading to sub-optimal imputation for clustering. Moreover, existing refinement strategies optimize the learned embedding through graph reconstruction, while neglecting the fact that some attributes are uncorrelated with the graph. To remedy the problems, we establish the Clustering-oriented Generative Imputation with reliable Refinement (CGIR) model. Concretely, the subcluster distributions are estimated to reveal the class-specific characteristics precisely, and constrain the sampling space of the generative adversarial module, such that the imputation nodes are impelled to align with the correct clusters. Afterwards, multiple subclusters are merged to guide the proposed edge attention network, which identifies the edge-wise attributes for each class, so as to avoid the redundant attributes in graph reconstruction from disturbing the refinement of overall embedding. To sum up, CGIR splits attribute-missing graph clustering into the search and mergence of subclusters, which guides to implement node imputation and refinement within a unified framework. Extensive experiments prove the advantages of CGIR over state-of-the-art competitors.

Clustering-Oriented Generative Attribute Graph Imputation

TL;DR

This work tackles attribute-missing graph clustering by jointly performing clustering-aware imputation and edge-oriented refinement. It introduces CGIR, which first learns subcluster distributions to constrain generative imputation and then applies an Edge Attention Network with contrastive learning to identify edge-wise attributes for reliable graph reconstruction. The approach combines a subcluster-aware generator, a discriminator, and an edge-attentional refinement module, trained with an alternating adversarial objective and a subcluster regularizer. Empirical results on four benchmarks demonstrate robust clustering performance under substantial attribute missingness, with ablations confirming the utility of subcluster modeling and edge-focused refinement for practical unsupervised graph clustering.

Abstract

Attribute-missing graph clustering has emerged as a significant unsupervised task, where only attribute vectors of partial nodes are available and the graph structure is intact. The related models generally follow the two-step paradigm of imputation and refinement. However, most imputation approaches fail to capture class-relevant semantic information, leading to sub-optimal imputation for clustering. Moreover, existing refinement strategies optimize the learned embedding through graph reconstruction, while neglecting the fact that some attributes are uncorrelated with the graph. To remedy the problems, we establish the Clustering-oriented Generative Imputation with reliable Refinement (CGIR) model. Concretely, the subcluster distributions are estimated to reveal the class-specific characteristics precisely, and constrain the sampling space of the generative adversarial module, such that the imputation nodes are impelled to align with the correct clusters. Afterwards, multiple subclusters are merged to guide the proposed edge attention network, which identifies the edge-wise attributes for each class, so as to avoid the redundant attributes in graph reconstruction from disturbing the refinement of overall embedding. To sum up, CGIR splits attribute-missing graph clustering into the search and mergence of subclusters, which guides to implement node imputation and refinement within a unified framework. Extensive experiments prove the advantages of CGIR over state-of-the-art competitors.

Paper Structure

This paper contains 23 sections, 18 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Challenge and opportunity of attribute graph imputation. (a) The neighborhood derived from KNN may incorporate inter-class nodes, thus the imputation result may deviate from the true class, resulting in an ambiguous cluster structure. (b) The subcluster provides a clustering-oriented distribution to guide flexible imputation, thus the imputation result remains compact with intra-class nodes, which facilitates a clear cluster structure.
  • Figure 2: Overall flowchart of CGIR. Note that only one layer of edge attention networks is drawn. $\mathbf{Z}$ is the initial graph embedding, $\{\mathcal{N}(\mu_1, \sigma_1^2), \mathcal{N}(\mu_2, \sigma_2^2), ..., \mathcal{N}(\mu_m, \sigma_m^2) \}$ is a series of learned subcluster distributions, and $\mathbf{F}$ is the updated embedding by generative imputation. In edge attention networks, $\mathbf{Q}$, $\mathbf{K}$, and $\mathbf{V}$ is the query, key, and value embeddings respectively, $\mathbf{A}$ is the weight matrix, and $\mathbf{U}$ is the output embedding that emphasizes the edge-wise attributes. The framework is updated by alternate training shown in Section \ref{['sec:training']}.
  • Figure 3: Comparison results on two benchmarks with extensive missing ratios (i.e., 0 to 0.9 with 0.1 interval).
  • Figure 4: Running time (s) of graph learning models on four benchmarks with 0.2 missing ratio.
  • Figure 5: Ablation comparison of subcluster-aware loss on two benchmarks with 0.2 missing ratio.
  • ...and 3 more figures