Scaling GraphLLM with Bilevel-Optimized Sparse Querying

Yangzhe Peng; Haiquan Qiu; Quanming Yao; Kun He

Scaling GraphLLM with Bilevel-Optimized Sparse Querying

Yangzhe Peng, Haiquan Qiu, Quanming Yao, Kun He

TL;DR

Bilevel-Optimized Sparse Querying (BOSQ) tackles the scalability bottleneck of GraphLLMs on Text-Attributed Graphs by learning a sparse, task-driven querying policy that limits LLM calls to a fixed budget $K$ of nodes. The method formulates a bilevel optimization: an outer layer learns node importances $\boldsymbol{\lambda}$ to select informative nodes, while an inner layer optimizes a GNN with the augmented node features; a differentiable top-$K$ mask $\mathbf{m}$ is obtained via a straight-through estimator, and LLM/LM parameters are kept frozen. Hypergradients are computed through the Implicit Function Theorem with a Neumann-series approximation, enabling end-to-end learning with a complexity dominated by $O(K\cdot C_{\text{LLM}})$. Empirical results on six real-world TAGs show BOSQ delivers orders-of-magnitude speedups over dense GraphLLMs and scalable performance, including successful million-scale evaluation on ogbn-products with competitive accuracy and transferable node-importance scores across model sizes. This work makes practical GraphLLMs feasible for large-scale TAGs by efficiently leveraging LLM explanations without sacrificing task performance.

Abstract

LLMs have recently shown strong potential in enhancing node-level tasks on text-attributed graphs (TAGs) by providing explanation features. However, their practical use is severely limited by the high computational and monetary cost of repeated LLM queries. To illustrate, naively generating explanations for all nodes on a medium-sized benchmark like Photo (48k nodes) using a representative method (e.g., TAPE) would consume days of processing time. In this paper, we propose Bilevel-Optimized Sparse Querying (BOSQ), a general framework that selectively leverages LLM-derived explanation features to enhance performance on node-level tasks on TAGs. We design an adaptive sparse querying strategy that selectively decides when to invoke LLMs, avoiding redundant or low-gain queries and significantly reducing computation overhead. Extensive experiments on six real-world TAG datasets involving two types of node-level tasks demonstrate that BOSQ achieves orders of magnitude speedups over existing GraphLLM methods while consistently delivering on-par or superior performance.

Scaling GraphLLM with Bilevel-Optimized Sparse Querying

TL;DR

of nodes. The method formulates a bilevel optimization: an outer layer learns node importances

to select informative nodes, while an inner layer optimizes a GNN with the augmented node features; a differentiable top-

mask

is obtained via a straight-through estimator, and LLM/LM parameters are kept frozen. Hypergradients are computed through the Implicit Function Theorem with a Neumann-series approximation, enabling end-to-end learning with a complexity dominated by

. Empirical results on six real-world TAGs show BOSQ delivers orders-of-magnitude speedups over dense GraphLLMs and scalable performance, including successful million-scale evaluation on ogbn-products with competitive accuracy and transferable node-importance scores across model sizes. This work makes practical GraphLLMs feasible for large-scale TAGs by efficiently leveraging LLM explanations without sacrificing task performance.

Abstract

Paper Structure (30 sections, 2 theorems, 29 equations, 2 figures, 10 tables, 3 algorithms)

This paper contains 30 sections, 2 theorems, 29 equations, 2 figures, 10 tables, 3 algorithms.

Introduction
Method
Preliminary
Bilevel-Optimized Sparse Querying
Task-driven Optimization and Implementation
Advantages over Alternative Approaches
Experiment
Experimental Setup
Overall Performance Comparison (Q1)
Scaling up to Million-Scale Graph (Q2)
Ablation Study of Sparse Selection Strategy (Q3)
Transferability of Importance Scores (Q4)
Related Works
Conclusion
IFT proof
...and 15 more sections

Key Result

Theorem 2.1

If for a given hyperparameter $\boldsymbol{\lambda}'$, the model parameters $\mathbf{w}' = \mathbf{w}^*(\boldsymbol{\lambda}')$ satisfy the stationarity condition and the training Hessian $\mathbf{H}_{\mathbf{w}\mathbf{w}} := \frac{\partial^2 \mathcal{L}_T}{\partial \mathbf{w} \partial \mathbf{w}^T}$ is invertible at $(\boldsymbol{\lambda}', \mathbf{w}')$, then the best-response function $\mathbf

Figures (2)

Figure 1: BOSQ significantly outperforms existing GraphLLM frameworks in efficiency without sacrificing task performance. By transitioning from Dense Querying to Sparse Querying via bilevel optimization, BOSQ reduces total end-to-end execution time by orders of magnitude compared to TAPE. Our method (red star) sits at the ideal top-left region, offering a practical and scalable solution for large-scale TAGs where traditional GraphLLM baselines are infeasible. Please refer to Table \ref{['tab:clf-exp']} for detailed results.
Figure 2: Overview of the BOSQ framework. Our method treats node selection as a bilevel optimization problem. The outer loop (purple) learns an Adaptive Selector to identify a sparse subset of nodes that benefit most from LLM explanations, while the inner loop (blue) optimizes a GNN on the resulting augmented graph. By using hypergradients to guide the selection mask $\mathbf{m}$, BOSQ selectively invokes the LLM only for critical nodes ($K \ll |\mathcal{V}|$). This mechanism achieves a superior trade-off between task performance and computational efficiency compared to previous dense querying baselines (see Comparison box).

Theorems & Definitions (3)

Theorem 2.1: Implicit Function Theorem (IFT) Solution
Theorem 1.1: Implicit Function Theorem (IFT) Solution
proof

Scaling GraphLLM with Bilevel-Optimized Sparse Querying

TL;DR

Abstract

Scaling GraphLLM with Bilevel-Optimized Sparse Querying

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (3)