Table of Contents
Fetching ...

The Structure-Content Trade-off in Knowledge Graph Retrieval

Valentin Six, Evan Dufraisse, Gaël de Chalendar

TL;DR

This work tackles how retrieval design affects LLM reasoning over knowledge graphs in KGQA. It introduces a parametric hybrid retrieval that interpolates between initial-question focus and subquestion focus using $\alpha \in [0,1]$, with per-subquestion top-$k$ retrieval and Prize-Collecting Steiner Tree pruning to produce a connected subgraph $G^*$; this framework enables analysis of the resulting structure-content balance. The results show that QA accuracy is maximized at intermediate $\alpha$ values (roughly $0.3$–$0.7$), achieving high connectivity while preserving content relevance. The findings offer practical guidance for designing retrieval strategies in KGQA, suggesting adaptive balancing of what is retrieved and how it is connected to support reliable, multi-hop reasoning over structured knowledge.

Abstract

Large Language Models (LLMs) increasingly rely on knowledge graphs for factual reasoning, yet how retrieval design shapes their performance remains unclear. We examine how question decomposition changes the retrieved subgraph's content and structure. Using a hybrid retrieval function that controls the importance of initial question and subquestions, we show that subquestion-based retrieval improves content precision, but yields disjoint subgraphs, while question-based retrieval maintains structure at the cost of relevance. Optimal performance arises between these extremes, revealing that balancing retrieval content and structure is key to effective LLM reasoning over structured knowledge.

The Structure-Content Trade-off in Knowledge Graph Retrieval

TL;DR

This work tackles how retrieval design affects LLM reasoning over knowledge graphs in KGQA. It introduces a parametric hybrid retrieval that interpolates between initial-question focus and subquestion focus using , with per-subquestion top- retrieval and Prize-Collecting Steiner Tree pruning to produce a connected subgraph ; this framework enables analysis of the resulting structure-content balance. The results show that QA accuracy is maximized at intermediate values (roughly ), achieving high connectivity while preserving content relevance. The findings offer practical guidance for designing retrieval strategies in KGQA, suggesting adaptive balancing of what is retrieved and how it is connected to support reliable, multi-hop reasoning over structured knowledge.

Abstract

Large Language Models (LLMs) increasingly rely on knowledge graphs for factual reasoning, yet how retrieval design shapes their performance remains unclear. We examine how question decomposition changes the retrieved subgraph's content and structure. Using a hybrid retrieval function that controls the importance of initial question and subquestions, we show that subquestion-based retrieval improves content precision, but yields disjoint subgraphs, while question-based retrieval maintains structure at the cost of relevance. Optimal performance arises between these extremes, revealing that balancing retrieval content and structure is key to effective LLM reasoning over structured knowledge.

Paper Structure

This paper contains 7 sections, 3 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Effect of $\alpha$ on subgraph structure (blue) and content (green). Lower $\alpha$ (focusing on initial question) increases connectivity and density, but lowers content relevance; higher $\alpha$ (focusing on subquestions) yields opposite observations.
  • Figure 2: Example of retrieved subgraphs for different setups: $\alpha = 0$ (focusing on initial question), and $\alpha = 1$ (focusing on subquestions).
  • Figure 3: QA accuracy peaks at intermediate $\alpha$, showing the benefit of balancing structure and content. Note that $\alpha = 0$ corresponds to the setup in G-Retriever.