The Structure-Content Trade-off in Knowledge Graph Retrieval
Valentin Six, Evan Dufraisse, Gaël de Chalendar
TL;DR
This work tackles how retrieval design affects LLM reasoning over knowledge graphs in KGQA. It introduces a parametric hybrid retrieval that interpolates between initial-question focus and subquestion focus using $\alpha \in [0,1]$, with per-subquestion top-$k$ retrieval and Prize-Collecting Steiner Tree pruning to produce a connected subgraph $G^*$; this framework enables analysis of the resulting structure-content balance. The results show that QA accuracy is maximized at intermediate $\alpha$ values (roughly $0.3$–$0.7$), achieving high connectivity while preserving content relevance. The findings offer practical guidance for designing retrieval strategies in KGQA, suggesting adaptive balancing of what is retrieved and how it is connected to support reliable, multi-hop reasoning over structured knowledge.
Abstract
Large Language Models (LLMs) increasingly rely on knowledge graphs for factual reasoning, yet how retrieval design shapes their performance remains unclear. We examine how question decomposition changes the retrieved subgraph's content and structure. Using a hybrid retrieval function that controls the importance of initial question and subquestions, we show that subquestion-based retrieval improves content precision, but yields disjoint subgraphs, while question-based retrieval maintains structure at the cost of relevance. Optimal performance arises between these extremes, revealing that balancing retrieval content and structure is key to effective LLM reasoning over structured knowledge.
