Table of Contents
Fetching ...

Glance for Context: Learning When to Leverage LLMs for Node-Aware GNN-LLM Fusion

Donald Loveland, Yao-An Yang, Danai Koutra

TL;DR

The paper tackles the inefficiency of uniform LLM-GNN fusion on text-attributed graphs by introducing GLANCE, a node-aware framework that learns when to route a node to an LLM. It leverages lightweight routing features, including local homophily estimates, and uses a non-differentiable router trained with counterfactual rewards to balance accuracy and LLM cost. GLANCE generates multi-hop LLM embeddings for routed neighborhoods and refines GNN predictions through a fusion head, achieving robust improvements across homophily spectra and scaling effectively to large TAGs with minimal LLM queries. The work demonstrates that adaptive, structure-aware routing yields significant gains on challenging nodes (up to $+13\%$ on heterophilous pockets) while preserving strong overall performance, offering a practical pathway for scalable GNN-LLM integration.

Abstract

Learning on text-attributed graphs has motivated the use of Large Language Models (LLMs) for graph learning. However, most fusion strategies are applied uniformly across all nodes and attain only small overall performance gains. We argue this result stems from aggregate metrics that obscure when LLMs provide benefit, inhibiting actionable signals for new strategies. In this work, we reframe LLM-GNN fusion around nodes where GNNs typically falter. We first show that performance can significantly differ between GNNs and LLMs, with each excelling on distinct structural patterns, such as local homophily. To leverage this finding, we propose GLANCE (GNN with LLM Assistance for Neighbor- and Context-aware Embeddings), a framework that invokes an LLM to refine a GNN's prediction. GLANCE employs a lightweight router that, given inexpensive per-node signals, decides whether to query the LLM. Since the LLM calls are non-differentiable, the router is trained with an advantage-based objective that compares the utility of querying the LLM against relying solely on the GNN. Across multiple benchmarks, GLANCE achieves the best performance balance across node subgroups, achieving significant gains on heterophilous nodes (up to $+13\%$) while simultaneously achieving top overall performance. Our findings highlight the value of adaptive, node-aware GNN-LLM architectures, where selectively invoking the LLM enables scalable deployment on large graphs without incurring high computational costs.

Glance for Context: Learning When to Leverage LLMs for Node-Aware GNN-LLM Fusion

TL;DR

The paper tackles the inefficiency of uniform LLM-GNN fusion on text-attributed graphs by introducing GLANCE, a node-aware framework that learns when to route a node to an LLM. It leverages lightweight routing features, including local homophily estimates, and uses a non-differentiable router trained with counterfactual rewards to balance accuracy and LLM cost. GLANCE generates multi-hop LLM embeddings for routed neighborhoods and refines GNN predictions through a fusion head, achieving robust improvements across homophily spectra and scaling effectively to large TAGs with minimal LLM queries. The work demonstrates that adaptive, structure-aware routing yields significant gains on challenging nodes (up to on heterophilous pockets) while preserving strong overall performance, offering a practical pathway for scalable GNN-LLM integration.

Abstract

Learning on text-attributed graphs has motivated the use of Large Language Models (LLMs) for graph learning. However, most fusion strategies are applied uniformly across all nodes and attain only small overall performance gains. We argue this result stems from aggregate metrics that obscure when LLMs provide benefit, inhibiting actionable signals for new strategies. In this work, we reframe LLM-GNN fusion around nodes where GNNs typically falter. We first show that performance can significantly differ between GNNs and LLMs, with each excelling on distinct structural patterns, such as local homophily. To leverage this finding, we propose GLANCE (GNN with LLM Assistance for Neighbor- and Context-aware Embeddings), a framework that invokes an LLM to refine a GNN's prediction. GLANCE employs a lightweight router that, given inexpensive per-node signals, decides whether to query the LLM. Since the LLM calls are non-differentiable, the router is trained with an advantage-based objective that compares the utility of querying the LLM against relying solely on the GNN. Across multiple benchmarks, GLANCE achieves the best performance balance across node subgroups, achieving significant gains on heterophilous nodes (up to ) while simultaneously achieving top overall performance. Our findings highlight the value of adaptive, node-aware GNN-LLM architectures, where selectively invoking the LLM enables scalable deployment on large graphs without incurring high computational costs.

Paper Structure

This paper contains 60 sections, 12 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Stratified performance for $h_{v}$ (top) and $\bar{d_{v}}$ (bottom). Bars denote property distributions (right y-axis). While enhanced GNNs can benefit heterophilous and low degree node, LLMs offer further improvements.
  • Figure 2: GLANCE Overview.Step 1: GLANCE generates routing features to derive a decision. Step 2: A routed node's text is fed into the LLM to generate embeddings. Step 3: A routed node's GNN & LLM embeddings are used to refine predictions. For nodes not routed, the GNN prediction is used. Only the router and refiner MLP are trained.
  • Figure 3: Local homophily for routed nodes, split by benefit. Blue lines denote median.
  • Figure 4: Stratified Performance. Performance is given for local homophily (top) and relative degree (bottom); bars denote property distributions (right y-axis).
  • Figure 5: Stratified Performance Based on Homophily and Degree. Darker red denotes instances where the LLM performs best, and darker blue denotes instances where the GNN performs best. Across methods, we see further deviation in performance as compared to the individual metrics
  • ...and 4 more figures