Table of Contents
Fetching ...

Exploring the Potential of Large Language Models for Heterophilic Graphs

Yuxia Wu, Shujie Li, Yuan Fang, Chuan Shi

TL;DR

This paper tackles node classification on heterophilic graphs by introducing LLM4HeG, a two-stage framework that leverages large language models to discriminate edge types and to guide edge-aware message passing. Stage 1 trains an LLM (via LoRA) to distinguish homophilic vs. heterophilic edges using node texts, while Stage 2 uses LLM-informed edge weights, combined with graph-based signals, to drive adaptive GNN aggregation. To address practicality, the authors distill the LLM’s heterophily knowledge into smaller language models, enabling faster inference with minimal performance loss. Experiments on five real-world datasets show consistent improvements over baselines, with distillation achieving competitive results and strong cross-backbone gains, highlighting the potential of LLMs to enrich semantic and structural reasoning in heterophilic graphs.

Abstract

Large language models (LLMs) have presented significant opportunities to enhance various machine learning applications, including graph neural networks (GNNs). By leveraging the vast open-world knowledge within LLMs, we can more effectively interpret and utilize textual data to better characterize heterophilic graphs, where neighboring nodes often have different labels. However, existing approaches for heterophilic graphs overlook the rich textual data associated with nodes, which could unlock deeper insights into their heterophilic contexts. In this work, we explore the potential of LLMs for modeling heterophilic graphs and propose a novel two-stage framework: LLM-enhanced edge discriminator and LLM-guided edge reweighting. In the first stage, we fine-tune the LLM to better identify homophilic and heterophilic edges based on the textual content of their nodes. In the second stage, we adaptively manage message propagation in GNNs for different edge types based on node features, structures, and heterophilic or homophilic characteristics. To cope with the computational demands when deploying LLMs in practical scenarios, we further explore model distillation techniques to fine-tune smaller, more efficient models that maintain competitive performance. Extensive experiments validate the effectiveness of our framework, demonstrating the feasibility of using LLMs to enhance node classification on heterophilic graphs.

Exploring the Potential of Large Language Models for Heterophilic Graphs

TL;DR

This paper tackles node classification on heterophilic graphs by introducing LLM4HeG, a two-stage framework that leverages large language models to discriminate edge types and to guide edge-aware message passing. Stage 1 trains an LLM (via LoRA) to distinguish homophilic vs. heterophilic edges using node texts, while Stage 2 uses LLM-informed edge weights, combined with graph-based signals, to drive adaptive GNN aggregation. To address practicality, the authors distill the LLM’s heterophily knowledge into smaller language models, enabling faster inference with minimal performance loss. Experiments on five real-world datasets show consistent improvements over baselines, with distillation achieving competitive results and strong cross-backbone gains, highlighting the potential of LLMs to enrich semantic and structural reasoning in heterophilic graphs.

Abstract

Large language models (LLMs) have presented significant opportunities to enhance various machine learning applications, including graph neural networks (GNNs). By leveraging the vast open-world knowledge within LLMs, we can more effectively interpret and utilize textual data to better characterize heterophilic graphs, where neighboring nodes often have different labels. However, existing approaches for heterophilic graphs overlook the rich textual data associated with nodes, which could unlock deeper insights into their heterophilic contexts. In this work, we explore the potential of LLMs for modeling heterophilic graphs and propose a novel two-stage framework: LLM-enhanced edge discriminator and LLM-guided edge reweighting. In the first stage, we fine-tune the LLM to better identify homophilic and heterophilic edges based on the textual content of their nodes. In the second stage, we adaptively manage message propagation in GNNs for different edge types based on node features, structures, and heterophilic or homophilic characteristics. To cope with the computational demands when deploying LLMs in practical scenarios, we further explore model distillation techniques to fine-tune smaller, more efficient models that maintain competitive performance. Extensive experiments validate the effectiveness of our framework, demonstrating the feasibility of using LLMs to enhance node classification on heterophilic graphs.
Paper Structure (20 sections, 8 equations, 6 figures, 7 tables)

This paper contains 20 sections, 8 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Overall framework of the proposed method LLM4HeG.
  • Figure 2: The effectiveness of learnable weight.
  • Figure 3: The accuracy of inductive node classification.
  • Figure 4: Analysis on the efficiency of the fine-tuned LLM and distilled SLMs.
  • Figure 5: Effect of the edge weight margin $\alpha$.
  • ...and 1 more figures