Table of Contents
Fetching ...

Efficient Multi-Hop Question Answering over Knowledge Graphs via LLM Planning and Embedding-Guided Search

Manil Shrestha, Edward Kim

TL;DR

This work tackles the challenge of verifiable, multi-hop question answering over knowledge graphs by proposing two complementary hybrid approaches: LLM-Guided Planning, which uses a single planning step to generate relation sequences grounded in the KG, and Embedding-Guided Neural Search, a lightweight, LLM-free method that fuses text and graph embeddings to guide edge scoring. The planning approach delivers near-perfect accuracy with strong grounding and transferability, while the embedding-based method achieves over 100x speedups with competitive accuracy, enabling scalable, API-free inference. The authors further demonstrate that planning capabilities can be distilled into a small 4B-parameter model via LoRA, reducing costs to zero inference cost while maintaining grounding. Across MetaQA, grounded reasoning consistently outperforms ungrounded generation, and the right inductive biases—combining symbolic structure with learned representations—allow scalable, auditable KGQA without reliance on massive LLMs.

Abstract

Multi-hop question answering over knowledge graphs remains computationally challenging due to the combinatorial explosion of possible reasoning paths. Recent approaches rely on expensive Large Language Model (LLM) inference for both entity linking and path ranking, limiting their practical deployment. Additionally, LLM-generated answers often lack verifiable grounding in structured knowledge. We present two complementary hybrid algorithms that address both efficiency and verifiability: (1) LLM-Guided Planning that uses a single LLM call to predict relation sequences executed via breadth-first search, achieving near-perfect accuracy (micro-F1 > 0.90) while ensuring all answers are grounded in the knowledge graph, and (2) Embedding-Guided Neural Search that eliminates LLM calls entirely by fusing text and graph embeddings through a lightweight 6.7M-parameter edge scorer, achieving over 100 times speedup with competitive accuracy. Through knowledge distillation, we compress planning capability into a 4B-parameter model that matches large-model performance at zero API cost. Evaluation on MetaQA demonstrates that grounded reasoning consistently outperforms ungrounded generation, with structured planning proving more transferable than direct answer generation. Our results show that verifiable multi-hop reasoning does not require massive models at inference time, but rather the right architectural inductive biases combining symbolic structure with learned representations.

Efficient Multi-Hop Question Answering over Knowledge Graphs via LLM Planning and Embedding-Guided Search

TL;DR

This work tackles the challenge of verifiable, multi-hop question answering over knowledge graphs by proposing two complementary hybrid approaches: LLM-Guided Planning, which uses a single planning step to generate relation sequences grounded in the KG, and Embedding-Guided Neural Search, a lightweight, LLM-free method that fuses text and graph embeddings to guide edge scoring. The planning approach delivers near-perfect accuracy with strong grounding and transferability, while the embedding-based method achieves over 100x speedups with competitive accuracy, enabling scalable, API-free inference. The authors further demonstrate that planning capabilities can be distilled into a small 4B-parameter model via LoRA, reducing costs to zero inference cost while maintaining grounding. Across MetaQA, grounded reasoning consistently outperforms ungrounded generation, and the right inductive biases—combining symbolic structure with learned representations—allow scalable, auditable KGQA without reliance on massive LLMs.

Abstract

Multi-hop question answering over knowledge graphs remains computationally challenging due to the combinatorial explosion of possible reasoning paths. Recent approaches rely on expensive Large Language Model (LLM) inference for both entity linking and path ranking, limiting their practical deployment. Additionally, LLM-generated answers often lack verifiable grounding in structured knowledge. We present two complementary hybrid algorithms that address both efficiency and verifiability: (1) LLM-Guided Planning that uses a single LLM call to predict relation sequences executed via breadth-first search, achieving near-perfect accuracy (micro-F1 > 0.90) while ensuring all answers are grounded in the knowledge graph, and (2) Embedding-Guided Neural Search that eliminates LLM calls entirely by fusing text and graph embeddings through a lightweight 6.7M-parameter edge scorer, achieving over 100 times speedup with competitive accuracy. Through knowledge distillation, we compress planning capability into a 4B-parameter model that matches large-model performance at zero API cost. Evaluation on MetaQA demonstrates that grounded reasoning consistently outperforms ungrounded generation, with structured planning proving more transferable than direct answer generation. Our results show that verifiable multi-hop reasoning does not require massive models at inference time, but rather the right architectural inductive biases combining symbolic structure with learned representations.

Paper Structure

This paper contains 30 sections, 1 equation, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: Path pruning in knowledge graph exploration. Starting from the movie Inception, the goal is to find genres of other movies written by the same writers. Two strategies are discussed in this paper: (a) LLM-planned reasoning, where the model predicts the relation sequence 'written by → written by reversed → has genres'; and (b) multimodal edge scoring, which prunes paths using graph and text embeddings for efficient, LLM-free traversal.
  • Figure 2: Architecture of the proposed hybrid edge-scoring model. The question text, current node, and candidate edges/nodes are encoded using both text and graph embedding models. A hybrid fusion layer combines text and graph embeddings for each candidate triple. The fused representations, together with the text embedding and hop context, are passed through an attention MLP to compute relevance weights. The resulting attention-weighted features are fed into a classifier MLP, which outputs a score for each candidate edge.
  • Figure 3: Hybrid Fusion: Text and graph embeddings are projected into a shared space and adaptively combined via a learned gating MLP to form hybrid embeddings that integrate semantic and structural information.
  • Figure 4: Training and validation loss for the Embedding-Guided Search model (Section \ref{['subsubsec:neural_edge_scorer']}).
  • Figure 5: Training and validation loss for LoRA fine-tuning of the Qwen3-4B model using 10K training examples.