Table of Contents
Fetching ...

Trustworthy Neighborhoods Mining: Homophily-Aware Neutral Contrastive Learning for Graph Clustering

Liang Peng, Yixuan Ye, Cheng Liu, Hangjun Che, Man-Fai Leung, Si Wu, Hau-San Wong

TL;DR

This work tackles graph clustering under real-world heterogeneity in homophily by introducing NeuCGC, a homophily-aware framework that uses neutral pairs weighted as partial positives to adapt contrastive learning to neighborhood trustworthiness. It combines pseudo-Siamese encoders with global feature distribution alignment and a novel neutral contrastive distribution alignment, together with an adaptive feature consistency module that expands reliable neighborhood information via a high-confidence graph. Empirical results across homophilic and heterophilic datasets show NeuCGC achieves state-of-the-art clustering performance, particularly on low-homophily graphs, while maintaining scalability comparable to InfoNCE-based methods. The approach offers robust, flexible learning of node representations by effectively exploiting trustworthy neighborhood information, with strong evidence of ablations confirming the contribution of each component and practical guidance on hyperparameters.

Abstract

Recently, neighbor-based contrastive learning has been introduced to effectively exploit neighborhood information for clustering. However, these methods rely on the homophily assumption-that connected nodes share similar class labels and should therefore be close in feature space-which fails to account for the varying homophily levels in real-world graphs. As a result, applying contrastive learning to low-homophily graphs may lead to indistinguishable node representations due to unreliable neighborhood information, making it challenging to identify trustworthy neighborhoods with varying homophily levels in graph clustering. To tackle this, we introduce a novel neighborhood Neutral Contrastive Graph Clustering method, NeuCGC, that extends traditional contrastive learning by incorporating neutral pairs-node pairs treated as weighted positive pairs, rather than strictly positive or negative. These neutral pairs are dynamically adjusted based on the graph's homophily level, enabling a more flexible and robust learning process. Leveraging neutral pairs in contrastive learning, our method incorporates two key components: (1) an adaptive contrastive neighborhood distribution alignment that adjusts based on the homophily level of the given attribute graph, ensuring effective alignment of neighborhood distributions, and (2) a contrastive neighborhood node feature consistency learning mechanism that leverages reliable neighborhood information from high-confidence graphs to learn robust node representations, mitigating the adverse effects of varying homophily levels and effectively exploiting highly trustworthy neighborhood information. Experimental results demonstrate the effectiveness and robustness of our approach, outperforming other state-of-the-art graph clustering methods. Our code is available at https://github.com/THPengL/NeuCGC.

Trustworthy Neighborhoods Mining: Homophily-Aware Neutral Contrastive Learning for Graph Clustering

TL;DR

This work tackles graph clustering under real-world heterogeneity in homophily by introducing NeuCGC, a homophily-aware framework that uses neutral pairs weighted as partial positives to adapt contrastive learning to neighborhood trustworthiness. It combines pseudo-Siamese encoders with global feature distribution alignment and a novel neutral contrastive distribution alignment, together with an adaptive feature consistency module that expands reliable neighborhood information via a high-confidence graph. Empirical results across homophilic and heterophilic datasets show NeuCGC achieves state-of-the-art clustering performance, particularly on low-homophily graphs, while maintaining scalability comparable to InfoNCE-based methods. The approach offers robust, flexible learning of node representations by effectively exploiting trustworthy neighborhood information, with strong evidence of ablations confirming the contribution of each component and practical guidance on hyperparameters.

Abstract

Recently, neighbor-based contrastive learning has been introduced to effectively exploit neighborhood information for clustering. However, these methods rely on the homophily assumption-that connected nodes share similar class labels and should therefore be close in feature space-which fails to account for the varying homophily levels in real-world graphs. As a result, applying contrastive learning to low-homophily graphs may lead to indistinguishable node representations due to unreliable neighborhood information, making it challenging to identify trustworthy neighborhoods with varying homophily levels in graph clustering. To tackle this, we introduce a novel neighborhood Neutral Contrastive Graph Clustering method, NeuCGC, that extends traditional contrastive learning by incorporating neutral pairs-node pairs treated as weighted positive pairs, rather than strictly positive or negative. These neutral pairs are dynamically adjusted based on the graph's homophily level, enabling a more flexible and robust learning process. Leveraging neutral pairs in contrastive learning, our method incorporates two key components: (1) an adaptive contrastive neighborhood distribution alignment that adjusts based on the homophily level of the given attribute graph, ensuring effective alignment of neighborhood distributions, and (2) a contrastive neighborhood node feature consistency learning mechanism that leverages reliable neighborhood information from high-confidence graphs to learn robust node representations, mitigating the adverse effects of varying homophily levels and effectively exploiting highly trustworthy neighborhood information. Experimental results demonstrate the effectiveness and robustness of our approach, outperforming other state-of-the-art graph clustering methods. Our code is available at https://github.com/THPengL/NeuCGC.

Paper Structure

This paper contains 27 sections, 1 theorem, 23 equations, 7 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Minimizing $\mathcal{L}_{AFC}$ in (eq_L_AFC) incorporates the maximization of the InfoNCE objective $I_{NCE}$, which is equivalent to maximizing mutual information between the original attributes $\mathbf{X}$ and the two latent representations $\mathbf{Z}^{(1)}$ and $\mathbf{Z}^{(2)}$: Therefore, optimizing the AFC objective can lead to superior representations compared to the InfoNCE objective.

Figures (7)

  • Figure 1: Overview of the proposed NeuCGC. Given the node attributes $\textbf{X}$ and the adjacency matrix $\textbf{A}$ of a graph, we employ a Pseudo-Siamese Networks to encode $\textbf{X}$ into embeddings $\textbf{Z}^{(1)}$ and $\textbf{Z}^{(2)}$, and jointly optimize the Global Feature Distribution Alignment (GDA), Neutral Contrastive Distribution Alignment (NCA), and Adaptive Feature Consistency Neutral Contrastive Learning (AFC). Specifically, GDA facilitates global information sharing. NCA adopts the Neutral Contrastive Factor Estimation (NCFE) technique which leverages $\textbf{A}$ and node pairwise similarity $\textbf{S}$ to estimate a coarse-grained neutral contrastive factor $\eta$, thereby enabling neutral contrastive learning and enhancing local neighborhood information interaction. AFC employs $\textbf{S}$, $\textbf{A}$, and reliable pseudo-labels to construct a high-confidence graph $\textbf{H}$, which is subsequently used to enhance the feature consistency $s(\cdot)$ of node embeddings.
  • Figure 2: Computational cost comparison of our NeuCGC against seven CGC methods and two conventional DGC methods on four datasets.
  • Figure 3: Training curves of (a) homophily ratios $r_h$ and (b) graph neighborhood congener ratios $\delta$ in the original graph and the learned high-confidence graph $\textbf{H}$, respectively. It is evident that on the Cora, DBLP, Wisconsin, and Cornell datasets, $\textbf{H}$ improves $\delta$ by approximately 4.4, 14.3, 7.8, and 3.3 times, respectively, compared with the original graph. Meanwhile, the homophily of $\textbf{H}$ also improves across all datasets. This indicates that the quality of neighbors in $\textbf{H}$ is superior to that in the original graph.
  • Figure 4: t-SNE visualization comparison on (a) DBLP and (b) ACM datasets.
  • Figure 5: Sensitivity analysis of loss weighting factors $\lambda_1$ and $\lambda_2$.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Definition 1: Homophily Ratio 2020rh_Zhu
  • Definition 2: Neighborhood Homophily Ratio 2020rnh-mpei
  • Definition 3: Neutral Pair
  • Definition 4: Graph Neighborhood Congener Ratio
  • Theorem 1
  • proof