Efficient Multi-Hop Question Answering over Knowledge Graphs via LLM Planning and Embedding-Guided Search
Manil Shrestha, Edward Kim
TL;DR
This work tackles the challenge of verifiable, multi-hop question answering over knowledge graphs by proposing two complementary hybrid approaches: LLM-Guided Planning, which uses a single planning step to generate relation sequences grounded in the KG, and Embedding-Guided Neural Search, a lightweight, LLM-free method that fuses text and graph embeddings to guide edge scoring. The planning approach delivers near-perfect accuracy with strong grounding and transferability, while the embedding-based method achieves over 100x speedups with competitive accuracy, enabling scalable, API-free inference. The authors further demonstrate that planning capabilities can be distilled into a small 4B-parameter model via LoRA, reducing costs to zero inference cost while maintaining grounding. Across MetaQA, grounded reasoning consistently outperforms ungrounded generation, and the right inductive biases—combining symbolic structure with learned representations—allow scalable, auditable KGQA without reliance on massive LLMs.
Abstract
Multi-hop question answering over knowledge graphs remains computationally challenging due to the combinatorial explosion of possible reasoning paths. Recent approaches rely on expensive Large Language Model (LLM) inference for both entity linking and path ranking, limiting their practical deployment. Additionally, LLM-generated answers often lack verifiable grounding in structured knowledge. We present two complementary hybrid algorithms that address both efficiency and verifiability: (1) LLM-Guided Planning that uses a single LLM call to predict relation sequences executed via breadth-first search, achieving near-perfect accuracy (micro-F1 > 0.90) while ensuring all answers are grounded in the knowledge graph, and (2) Embedding-Guided Neural Search that eliminates LLM calls entirely by fusing text and graph embeddings through a lightweight 6.7M-parameter edge scorer, achieving over 100 times speedup with competitive accuracy. Through knowledge distillation, we compress planning capability into a 4B-parameter model that matches large-model performance at zero API cost. Evaluation on MetaQA demonstrates that grounded reasoning consistently outperforms ungrounded generation, with structured planning proving more transferable than direct answer generation. Our results show that verifiable multi-hop reasoning does not require massive models at inference time, but rather the right architectural inductive biases combining symbolic structure with learned representations.
