Table of Contents
Fetching ...

GCL-OT: Graph Contrastive Learning with Optimal Transport for Heterophilic Text-Attributed Graphs

Yating Ren, Yikun Ban, Huobin Tan

TL;DR

GCL-OT tackles multi-granular heterophily in text-attributed graphs by integrating optimal transport into graph contrastive learning to align structure and text representations bidirectionally. It introduces RealSoftMax-based similarity for partial heterophily, a prompt-based filter to mitigate complete heterophily, and OT-guided latent homophily mining to uncover hidden semantically related neighbors. Theoretical analysis shows tighter mutual information bounds and improved Bayes error guarantees, while extensive experiments on nine TAG benchmarks demonstrate robust, state-of-the-art performance across supervised and unsupervised settings. The framework remains effective with different PLMs and exhibits strong resilience to perturbations, highlighting practical applicability for real-world TAG tasks.

Abstract

Recently, structure-text contrastive learning has shown promising performance on text-attributed graphs by leveraging the complementary strengths of graph neural networks and language models. However, existing methods typically rely on homophily assumptions in similarity estimation and hard optimization objectives, which limit their applicability to heterophilic graphs. Although existing methods can mitigate heterophily through structural adjustments or neighbor aggregation, they usually treat textual embeddings as static targets, leading to suboptimal alignment. In this work, we identify the multi-granular heterophily in text-attributed graphs, including complete heterophily, partial heterophily, and latent homophily, which makes structure-text alignment particularly challenging due to mixed, noisy, and missing semantic correlations. To achieve flexible and bidirectional alignment, we propose GCL-OT, a novel graph contrastive learning framework with optimal transport, equipped with tailored mechanisms for each type of heterophily. Specifically, for partial heterophily, we design a RealSoftMax-based similarity estimator to emphasize key neighbor-word interactions while easing background noise. For complete heterophily, we introduce a prompt-based filter that adaptively excludes irrelevant noise during optimal transport alignment. Furthermore, we incorporate OT-guided soft supervision to uncover potential neighbors with similar semantics, enhancing the learning of latent homophily. Theoretical analysis shows that GCL-OT can improve the mutual information bound and Bayes error guarantees. Extensive experiments on nine benchmarks show that GCL-OT consistently outperforms state-of-the-art methods, verifying its effectiveness and robustness.

GCL-OT: Graph Contrastive Learning with Optimal Transport for Heterophilic Text-Attributed Graphs

TL;DR

GCL-OT tackles multi-granular heterophily in text-attributed graphs by integrating optimal transport into graph contrastive learning to align structure and text representations bidirectionally. It introduces RealSoftMax-based similarity for partial heterophily, a prompt-based filter to mitigate complete heterophily, and OT-guided latent homophily mining to uncover hidden semantically related neighbors. Theoretical analysis shows tighter mutual information bounds and improved Bayes error guarantees, while extensive experiments on nine TAG benchmarks demonstrate robust, state-of-the-art performance across supervised and unsupervised settings. The framework remains effective with different PLMs and exhibits strong resilience to perturbations, highlighting practical applicability for real-world TAG tasks.

Abstract

Recently, structure-text contrastive learning has shown promising performance on text-attributed graphs by leveraging the complementary strengths of graph neural networks and language models. However, existing methods typically rely on homophily assumptions in similarity estimation and hard optimization objectives, which limit their applicability to heterophilic graphs. Although existing methods can mitigate heterophily through structural adjustments or neighbor aggregation, they usually treat textual embeddings as static targets, leading to suboptimal alignment. In this work, we identify the multi-granular heterophily in text-attributed graphs, including complete heterophily, partial heterophily, and latent homophily, which makes structure-text alignment particularly challenging due to mixed, noisy, and missing semantic correlations. To achieve flexible and bidirectional alignment, we propose GCL-OT, a novel graph contrastive learning framework with optimal transport, equipped with tailored mechanisms for each type of heterophily. Specifically, for partial heterophily, we design a RealSoftMax-based similarity estimator to emphasize key neighbor-word interactions while easing background noise. For complete heterophily, we introduce a prompt-based filter that adaptively excludes irrelevant noise during optimal transport alignment. Furthermore, we incorporate OT-guided soft supervision to uncover potential neighbors with similar semantics, enhancing the learning of latent homophily. Theoretical analysis shows that GCL-OT can improve the mutual information bound and Bayes error guarantees. Extensive experiments on nine benchmarks show that GCL-OT consistently outperforms state-of-the-art methods, verifying its effectiveness and robustness.

Paper Structure

This paper contains 53 sections, 4 theorems, 71 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

Let $\mathcal{L}_{\text{InfoNCE}}$ denote the standard InfoNCE loss between structural embeddings $\mathbf{H}^{\zeta}$ and textual embeddings $\mathbf{H}^{t}$. According to the standard InfoNCE MI lower bound oord2019infonce, $\mathcal{L}_{\text{LHM}}$ provides a tighter variational lower bound than

Figures (8)

  • Figure 1: Examples and empirical analysis of multi-granular heterophily in TAGs. Node colors denote categories. $H_N / H_E$: node/edge heterophily, $R_{\text{NTD}} / R_{\text{NWD}}$: neighbor token/sentence dissimilarity, $R_{\text{UTS}}$: similarity of unconnected nodes.
  • Figure 2: Overview of GCL-OT. Given a TAG, an LLM enriches node texts, a PLM encodes the enriched texts, and a GNN captures structure features. The text and structure views form a similarity matrix, where RealSoftMax highlights fine-grained interactions and the filter prompt suppresses coarse-grained noise. The contrastive module then aligns the two views and uncovers latent homophily. Finally, the fused embeddings drive node prediction.
  • Figure 3: Improvements over InfoNCE across various heterophily metrics on Cora.
  • Figure 4: T-SNE visualization of node representations learned by different models on Cora (a–c) and Texas (d–f), with colors indicating ground-truth class labels.
  • Figure 5: Evaluation of model robustness under text and edge perturbations on Cora.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Proposition 1
  • Proposition 2
  • Theorem 1
  • proof
  • proof
  • proof
  • proof
  • Lemma 2: Bayes error and conditional entropy
  • proof