Table of Contents
Fetching ...

GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

Aarush Sinha

TL;DR

The paper tackles the problem of integrating pre-trained language models with graph neural networks to handle text-rich heterophilic graphs. It proposes Graph Masked Language Model (GMLM), a two-branch architecture with a multi-scale GNN and a PLM-based text branch that are fused through a bi-directional cross-attention mechanism, trained in two stages: contrastive GNN pre-training and end-to-end fine-tuning with active node selection. Empirically, GMLM achieves state-of-the-art results on four of five heterophilic benchmarks, with notable gains on Texas and Wisconsin datasets, and demonstrates that a carefully designed fusion is more effective and resource-efficient than simply scaling up large language models. The work highlights the practical value of deep, bidirectional integration for text-rich graph representation learning and outlines limitations and avenues for future work in text-scarce graphs and robustness enhancements.

Abstract

Integrating Pre-trained Language Models (PLMs) with Graph Neural Networks (GNNs) remains a central challenge in text-rich heterophilic graph learning. We propose a novel integration framework that enables effective fusion between powerful pre-trained text encoders and Relational Graph Convolutional Networks (R-GCNs). Our method enhances the alignment of textual and structural representations through a bidirectional fusion mechanism and contrastive node-level optimization. To evaluate the approach, we train two variants using different PLMs: Snowflake-Embed (state-of-the-art) and GTE-base, each paired with an R-GCN backbone. Experiments on five heterophilic benchmarks demonstrate that our integration method achieves state-of-the-art results on four datasets, surpassing existing GNN and large language model-based approaches. Notably, Snowflake-Embed + R-GCN improves accuracy on the Texas dataset by over 8\% and on Wisconsin by nearly 5\%. These results highlight the effectiveness of our fusion strategy for advancing text-rich graph representation learning.

GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

TL;DR

The paper tackles the problem of integrating pre-trained language models with graph neural networks to handle text-rich heterophilic graphs. It proposes Graph Masked Language Model (GMLM), a two-branch architecture with a multi-scale GNN and a PLM-based text branch that are fused through a bi-directional cross-attention mechanism, trained in two stages: contrastive GNN pre-training and end-to-end fine-tuning with active node selection. Empirically, GMLM achieves state-of-the-art results on four of five heterophilic benchmarks, with notable gains on Texas and Wisconsin datasets, and demonstrates that a carefully designed fusion is more effective and resource-efficient than simply scaling up large language models. The work highlights the practical value of deep, bidirectional integration for text-rich graph representation learning and outlines limitations and avenues for future work in text-scarce graphs and robustness enhancements.

Abstract

Integrating Pre-trained Language Models (PLMs) with Graph Neural Networks (GNNs) remains a central challenge in text-rich heterophilic graph learning. We propose a novel integration framework that enables effective fusion between powerful pre-trained text encoders and Relational Graph Convolutional Networks (R-GCNs). Our method enhances the alignment of textual and structural representations through a bidirectional fusion mechanism and contrastive node-level optimization. To evaluate the approach, we train two variants using different PLMs: Snowflake-Embed (state-of-the-art) and GTE-base, each paired with an R-GCN backbone. Experiments on five heterophilic benchmarks demonstrate that our integration method achieves state-of-the-art results on four datasets, surpassing existing GNN and large language model-based approaches. Notably, Snowflake-Embed + R-GCN improves accuracy on the Texas dataset by over 8\% and on Wisconsin by nearly 5\%. These results highlight the effectiveness of our fusion strategy for advancing text-rich graph representation learning.

Paper Structure

This paper contains 28 sections, 11 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The text related to each node is fed into the text-encoder and the node features by the RGCN layers that use multi-layer embeddings fused together. The Cross Attention layers between Graph and Text are then concatenated by a fusion layer which is finally trained and evaluated.
  • Figure 2: UMAP visualization of node embeddings using the GTE-base model (PLM backbone). The three panels show the embeddings for raw features, after contrastive pre-training, and after full fine-tuning.
  • Figure 3: UMAP visualization of node embeddings using the larger Snowflake embedding model (PLM backbone). These panels show a clear improvement in cluster separation after fine-tuning compared to the GTE-base model in Figure \ref{['fig:cg']}.
  • Figure 4: UMAP visualization of node embeddings on the Texas dataset using the GTE-base model. The plots show the progression from raw features to fine-tuned embeddings.
  • Figure 5: UMAP visualization of node embeddings on the Texas dataset using the Snowflake embedding model. A clear improvement in cluster quality is visible compared to the GTE-base model shown in Figure \ref{['fig:tg']}.
  • ...and 2 more figures