GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification
Aarush Sinha
TL;DR
The paper tackles the problem of integrating pre-trained language models with graph neural networks to handle text-rich heterophilic graphs. It proposes Graph Masked Language Model (GMLM), a two-branch architecture with a multi-scale GNN and a PLM-based text branch that are fused through a bi-directional cross-attention mechanism, trained in two stages: contrastive GNN pre-training and end-to-end fine-tuning with active node selection. Empirically, GMLM achieves state-of-the-art results on four of five heterophilic benchmarks, with notable gains on Texas and Wisconsin datasets, and demonstrates that a carefully designed fusion is more effective and resource-efficient than simply scaling up large language models. The work highlights the practical value of deep, bidirectional integration for text-rich graph representation learning and outlines limitations and avenues for future work in text-scarce graphs and robustness enhancements.
Abstract
Integrating Pre-trained Language Models (PLMs) with Graph Neural Networks (GNNs) remains a central challenge in text-rich heterophilic graph learning. We propose a novel integration framework that enables effective fusion between powerful pre-trained text encoders and Relational Graph Convolutional Networks (R-GCNs). Our method enhances the alignment of textual and structural representations through a bidirectional fusion mechanism and contrastive node-level optimization. To evaluate the approach, we train two variants using different PLMs: Snowflake-Embed (state-of-the-art) and GTE-base, each paired with an R-GCN backbone. Experiments on five heterophilic benchmarks demonstrate that our integration method achieves state-of-the-art results on four datasets, surpassing existing GNN and large language model-based approaches. Notably, Snowflake-Embed + R-GCN improves accuracy on the Texas dataset by over 8\% and on Wisconsin by nearly 5\%. These results highlight the effectiveness of our fusion strategy for advancing text-rich graph representation learning.
