Table of Contents
Fetching ...

Reliable Node Similarity Matrix Guided Contrastive Graph Clustering

Yunhui Liu, Xinyi Gao, Tieke He, Tao Zheng, Jianhua Zhao, Hongzhi Yin

TL;DR

A new framework, Reliable Node Similarity Matrix Guided Contrastive Graph Clustering (NS4GC), which estimates an approximately ideal node similarity matrix within the representation space to guide representation learning and introduces node-neighbor alignment and semantic-aware sparsification, ensuring the node similarity matrix is both accurate and efficiently sparse.

Abstract

Graph clustering, which involves the partitioning of nodes within a graph into disjoint clusters, holds significant importance for numerous subsequent applications. Recently, contrastive learning, known for utilizing supervisory information, has demonstrated encouraging results in deep graph clustering. This methodology facilitates the learning of favorable node representations for clustering by attracting positively correlated node pairs and distancing negatively correlated pairs within the representation space. Nevertheless, a significant limitation of existing methods is their inadequacy in thoroughly exploring node-wise similarity. For instance, some hypothesize that the node similarity matrix within the representation space is identical, ignoring the inherent semantic relationships among nodes. Given the fundamental role of instance similarity in clustering, our research investigates contrastive graph clustering from the perspective of the node similarity matrix. We argue that an ideal node similarity matrix within the representation space should accurately reflect the inherent semantic relationships among nodes, ensuring the preservation of semantic similarities in the learned representations. In response to this, we introduce a new framework, Reliable Node Similarity Matrix Guided Contrastive Graph Clustering (NS4GC), which estimates an approximately ideal node similarity matrix within the representation space to guide representation learning. Our method introduces node-neighbor alignment and semantic-aware sparsification, ensuring the node similarity matrix is both accurate and efficiently sparse. Comprehensive experiments conducted on $8$ real-world datasets affirm the efficacy of learning the node similarity matrix and the superior performance of NS4GC.

Reliable Node Similarity Matrix Guided Contrastive Graph Clustering

TL;DR

A new framework, Reliable Node Similarity Matrix Guided Contrastive Graph Clustering (NS4GC), which estimates an approximately ideal node similarity matrix within the representation space to guide representation learning and introduces node-neighbor alignment and semantic-aware sparsification, ensuring the node similarity matrix is both accurate and efficiently sparse.

Abstract

Graph clustering, which involves the partitioning of nodes within a graph into disjoint clusters, holds significant importance for numerous subsequent applications. Recently, contrastive learning, known for utilizing supervisory information, has demonstrated encouraging results in deep graph clustering. This methodology facilitates the learning of favorable node representations for clustering by attracting positively correlated node pairs and distancing negatively correlated pairs within the representation space. Nevertheless, a significant limitation of existing methods is their inadequacy in thoroughly exploring node-wise similarity. For instance, some hypothesize that the node similarity matrix within the representation space is identical, ignoring the inherent semantic relationships among nodes. Given the fundamental role of instance similarity in clustering, our research investigates contrastive graph clustering from the perspective of the node similarity matrix. We argue that an ideal node similarity matrix within the representation space should accurately reflect the inherent semantic relationships among nodes, ensuring the preservation of semantic similarities in the learned representations. In response to this, we introduce a new framework, Reliable Node Similarity Matrix Guided Contrastive Graph Clustering (NS4GC), which estimates an approximately ideal node similarity matrix within the representation space to guide representation learning. Our method introduces node-neighbor alignment and semantic-aware sparsification, ensuring the node similarity matrix is both accurate and efficiently sparse. Comprehensive experiments conducted on real-world datasets affirm the efficacy of learning the node similarity matrix and the superior performance of NS4GC.
Paper Structure (34 sections, 8 equations, 9 figures, 6 tables, 2 algorithms)

This paper contains 34 sections, 8 equations, 9 figures, 6 tables, 2 algorithms.

Figures (9)

  • Figure 1: Visualization of the node similarity matrix (NSM), with $\bullet$/$\circ$ denoting the pairwise semantic similarity/difference (1/0). (a) depicts the input attributed graph, where the digit is the node index and the color is the class index. (b) represents the input adjacency matrix, serving as a noise and incomplete surrogate for the ideal node similarity matrix. (c) is the ideal node similarity matrix. (d) illustrates the node similarity matrix for node-node contrastive learning methods, such as GRACE GRACE, where it is implicitly assumed to be an identity matrix. (e) displays our learned node similarity matrix, characterized as a refined adjacency matrix, enriched with additional intra-cluster edges, as depicted in (f). For intra-cluster but disconnected node pairs, like $(4,6)$, they are expected to exhibit relatively high cosine similarity due to their inherently similar input features or the influence of the message passing mechanism in graph neural networks. In this case, node-node contrastive methods such as GRACE impose heavier penalties on such pairs to distance them. In contrast, our semantic-aware sparsification mitigates heavy penalization for them, thereby preserving their high similarity to a considerable extent (see Gradient Analysis in Section \ref{['Sec: Gradient Analysis']}).
  • Figure 2: Overview of our proposed contrastive graph clustering framework NS4GC. For a given attributed graph, we first generate two distinct views via random augmentations: edge dropping and feature masking. These two views are subsequently fed into a shared GNN encoder to extract node representations. Then we instantiate the latent node similarity matrix using the cross-view cosine similarity. To optimize the model, we employ a combination of self-alignment loss, node-neighbor alignment loss, and a sparsity loss applied to the estimated node similarity matrix.
  • Figure 3: $\frac{\partial \mathcal{L}_{spa}}{\partial \boldsymbol{S}_{ij}}$ with $s = 0.5$ and different $\tau$ settings.
  • Figure 4: Node similarity matrices learned by GRACE, CCASSG, and NS4GC and the ideal node similarity matrix on Photo.
  • Figure 5: Impact of the split value $s$ on Cora, WikiCS, Photo and CoauthorCS.
  • ...and 4 more figures