Table of Contents
Fetching ...

Mitigating KG Quality Issues: A Robust Multi-Hop GraphRAG Retrieval Framework

Yizhuo Ma, Shuang Liang, Rongzheng Wang, Jiakai, Qizhi Chen, Muquan Li, Ke Qin

Abstract

Graph Retrieval-Augmented Generation enhances multi-hop reasoning but relies on imperfect knowledge graphs that frequently suffer from inherent quality issues. Current approaches often overlook these issues, consequently struggling with retrieval drift driven by spurious noise and retrieval hallucinations stemming from incomplete information. To address these challenges, we propose C2RAG (Constraint-Checked Retrieval-Augmented Generation), a framework aimed at robust multi-hop retrieval over the imperfect KG. First, C2RAG performs constraint-based retrieval by decomposing each query into atomic constraint triples, with using fine-grained constraint anchoring to filter candidates for suppressing retrieval drift. Second, C2RAG introduces a sufficiency check to explicitly prevent retrieval hallucinations by deciding whether the current evidence is sufficient to justify structural propagation, and activating textual recovery otherwise. Extensive experiments on multi-hop benchmarks demonstrate that C2RAG consistently outperforms the latest baselines by 3.4\% EM and 3.9\% F1 on average, while exhibiting improved robustness under KG issues.

Mitigating KG Quality Issues: A Robust Multi-Hop GraphRAG Retrieval Framework

Abstract

Graph Retrieval-Augmented Generation enhances multi-hop reasoning but relies on imperfect knowledge graphs that frequently suffer from inherent quality issues. Current approaches often overlook these issues, consequently struggling with retrieval drift driven by spurious noise and retrieval hallucinations stemming from incomplete information. To address these challenges, we propose C2RAG (Constraint-Checked Retrieval-Augmented Generation), a framework aimed at robust multi-hop retrieval over the imperfect KG. First, C2RAG performs constraint-based retrieval by decomposing each query into atomic constraint triples, with using fine-grained constraint anchoring to filter candidates for suppressing retrieval drift. Second, C2RAG introduces a sufficiency check to explicitly prevent retrieval hallucinations by deciding whether the current evidence is sufficient to justify structural propagation, and activating textual recovery otherwise. Extensive experiments on multi-hop benchmarks demonstrate that C2RAG consistently outperforms the latest baselines by 3.4\% EM and 3.9\% F1 on average, while exhibiting improved robustness under KG issues.
Paper Structure (37 sections, 3 theorems, 22 equations, 5 figures, 8 tables)

This paper contains 37 sections, 3 theorems, 22 equations, 5 figures, 8 tables.

Key Result

Lemma 1.1

For any distribution $p$ on $\mathcal{C}$, Moreover, $N_{\mathrm{eff}}(p)=1$ if $p$ is a point mass, and $N_{\mathrm{eff}}(p)=n$ if $p$ is uniform.

Figures (5)

  • Figure 1: The illustrative examples of typical LLM-constructed KG quality issues. Category A (Spurious Noise) introduces triples that contradict the provenance text, such as (i) over-generalized relation (e.g., "nominated" mis-extracted as "won"), (ii) mis-bound relation between entities (e.g., linking two entities with an incorrect generic edge), and (iii) semantic flip (e.g., "not associated" extracted as "associated"). Category B (Incomplete Information) omits graph elements required for multi-hop evidence chaining, such as missing bridge edge (e.g., absent "A15 Bionic" designed_by "Apple Inc.") or dropped qualifier (e.g., temporal constraints like "in 2002").
  • Figure 2: The Overview of C2RAG's workflow. (i) Constraint-based retrieval: the query is decomposed into an ordered sequence of atomic constraint triples with relation variants and placeholders, and each constraint is executed via anchor matching, relation filtering, and contextual reranking to produce candidates; (ii) sufficiency check: a hop-wise score determines whether to propagate induced bindings for the next hop or activate textual recovery when structural evidence is insufficient; (iii) the resulting evidences are consolidated and fed to the LLM for answer generation.
  • Figure 3: Robustness on MuSiQue (100 queries) under degraded KG quality. (a,b) QA performance as spurious noise is injected or incomplete information is introduced at increasing ratios around query-critical entities. (c) Hop-wise diffusion of KG quality issues measured by the proportion of unsupported evidence across hops under different control settings.
  • Figure 4: Hyperparameter sensitivity of C2RAG. We vary (a) the sufficiency check threshold $\gamma$, (b) the number of relation variants $m$, (c) the structural candidate budget $K_s$, and (d) the textual recovery budget $K_t$, reporting EM and the QA token ratio.
  • Figure 5: A case study trace for C2RAG (two-hop). Each block reports initial 1-hop top-$K$ candidates (before rerank), the constraint-aware reranked distribution, the solvability signal (N_eff vs. threshold $\gamma$), and the hop decision (Resolved / Unresolved) with optional textual recovery and final answer generation.

Theorems & Definitions (6)

  • Lemma 1.1: Range and extremal cases
  • proof
  • Lemma 1.2: Lower bound on the top probability
  • proof
  • Proposition 1.3: Sufficiency check guarantees a dominant candidate
  • proof