Table of Contents
Fetching ...

An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs

Waleed Afandi, Hussein Abdallah, Ashraf Aboulnaga, Essam Mansour

TL;DR

KG-WISE achieves up to 28x faster inference and 98% lower memory usage than state-of-the-art systems while maintaining or improving accuracy across both commercial and open-weight LLMs.

Abstract

Efficient inference for graph neural networks (GNNs) on large knowledge graphs (KGs) is essential for many real-world applications. GNN inference queries are computationally expensive and vary in complexity, as each involves a different number of target nodes linked to subgraphs of diverse densities and structures. Existing acceleration methods, such as pruning, quantization, and knowledge distillation, instantiate smaller models but do not adapt them to the structure or semantics of individual queries. They also store models as monolithic files that must be fully loaded, and miss the opportunity to retrieve only the neighboring nodes and corresponding model components that are semantically relevant to the target nodes. These limitations lead to excessive data loading and redundant computation on large KGs. This paper presents KG-WISE, a task-driven inference paradigm for large KGs. KG-WISE decomposes trained GNN models into fine-grained components that can be partially loaded based on the structure of the queried subgraph. It employs large language models (LLMs) to generate reusable query templates that extract semantically relevant subgraphs for each task, enabling query-aware and compact model instantiation. We evaluate KG-WISE on six large KGs with up to 42 million nodes and 166 million edges. KG-WISE achieves up to 28x faster inference and 98% lower memory usage than state-of-the-art systems while maintaining or improving accuracy across both commercial and open-weight LLMs.

An LLM-Guided Query-Aware Inference System for GNN Models on Large Knowledge Graphs

TL;DR

KG-WISE achieves up to 28x faster inference and 98% lower memory usage than state-of-the-art systems while maintaining or improving accuracy across both commercial and open-weight LLMs.

Abstract

Efficient inference for graph neural networks (GNNs) on large knowledge graphs (KGs) is essential for many real-world applications. GNN inference queries are computationally expensive and vary in complexity, as each involves a different number of target nodes linked to subgraphs of diverse densities and structures. Existing acceleration methods, such as pruning, quantization, and knowledge distillation, instantiate smaller models but do not adapt them to the structure or semantics of individual queries. They also store models as monolithic files that must be fully loaded, and miss the opportunity to retrieve only the neighboring nodes and corresponding model components that are semantically relevant to the target nodes. These limitations lead to excessive data loading and redundant computation on large KGs. This paper presents KG-WISE, a task-driven inference paradigm for large KGs. KG-WISE decomposes trained GNN models into fine-grained components that can be partially loaded based on the structure of the queried subgraph. It employs large language models (LLMs) to generate reusable query templates that extract semantically relevant subgraphs for each task, enabling query-aware and compact model instantiation. We evaluate KG-WISE on six large KGs with up to 42 million nodes and 166 million edges. KG-WISE achieves up to 28x faster inference and 98% lower memory usage than state-of-the-art systems while maintaining or improving accuracy across both commercial and open-weight LLMs.
Paper Structure (21 sections, 3 equations, 11 figures, 4 tables, 4 algorithms)

This paper contains 21 sections, 3 equations, 11 figures, 4 tables, 4 algorithms.

Figures (11)

  • Figure 1: A GNN inference query for target nodes ($TN$) loads the KG’s adjacency, model, and embedding matrices from storage, then performs message passing and embedding updates to produce predictions. This process is resource-intensive and scales poorly with the size of these matrices in KGs.
  • Figure 2: KG-WISE orchestrates training and inference on large KGs through LLM-guided subgraph extraction, fine-grained model storage, and query-aware model instantiation.
  • Figure 3: The top half shows an encoded KG and a GNN trained on it. The GNN's main components are the learned parameters and non-target node embeddings. The bottom half illustrates how KG-WISE maps the encoding and decouples the GNN components. The node embeddings are stored in a key-value store as as row-wise chunks grouped by node type in a Zarr KV store.
  • Figure 4: KG-WISE inference pipeline. Given an inference query, KG-WISE loads a stored SPARQL template to extract a semantically relevant subgraph $SG$, then instantiates a compact model $\widetilde{M}$ by loading only the required embeddings and weights from the KV store. Inference is executed on-demand using sparse tensor aggregation over $SG$, avoiding full model loading.
  • Figure 5: Performance across NC tasks is based on three metrics: (A) Inference Accuracy (higher is better), (B) Inference-Time (lower is better), and (C) Inference Memory (lower is better). The top and middle sections illustrate the results for the Paper-Venue task on DBLP and MAG, respectively. The bottom figures present the results of the Place-Country task on YAGO4. The inference query performs inference for 1K target nodes stratified across all classes. KG-WISE archives comparable inference accuracy compared with the SOTA training/inference accelerators (Graph SAINT, IBMB, GCNP, and DQ). KG-WISE outperforms the SOTA methods by up to 28x in inference time on YAGO4 with memory reduction up to 98%.
  • ...and 6 more figures