BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

Duyi Pan; Tianao Lou; Xin Li; Haoze Song; Yiwen Wu; Mengyi Deng; Mingyu Yang; Wei Wang

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

Duyi Pan, Tianao Lou, Xin Li, Haoze Song, Yiwen Wu, Mengyi Deng, Mingyu Yang, Wei Wang

Abstract

Large Language Models (LLMs) exhibit hallucinations in knowledge-intensive tasks. Graph-based retrieval augmented generation (RAG) has emerged as a promising solution, yet existing approaches suffer from fundamental recall and precision limitations when operating over black-box knowledge graphs -- graphs whose schema and structure are unknown in advance. We identify three core challenges that cause recall loss (semantic instantiation uncertainty and structural path uncertainty) and precision loss (evidential comparison uncertainty). To address these challenges, we formalize the retrieval task as the Optimal Informative Subgraph Retrieval (OISR) problem -- a variant of Group Steiner Tree -- and prove it to be NP-hard and APX-hard. We propose BubbleRAG, a training-free pipeline that systematically optimizes for both recall and precision through semantic anchor grouping, heuristic bubble expansion to discover candidate evidence graphs (CEGs), composite ranking, and reasoning-aware expansion. Experiments on multi-hop QA benchmarks demonstrate that BubbleRAG achieves state-of-the-art results, outperforming strong baselines in both F1 and accuracy while remaining plug-and-play.

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

Abstract

Paper Structure (21 sections, 2 theorems, 8 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 2 theorems, 8 equations, 4 figures, 5 tables, 1 algorithm.

Introduction
Preliminaries
Graph-Based RAG
Black-Box Knowledge Graphs
Motivations
Problem Formulation
The BubbleRAG Framework
Data Preparation
Semantic Anchor Grouping
Candidate Evidence Graph Discovery
Candidate Evidence Graph Ranking
Reasoning-Aware Expansion
Answer Generation
Experiments
Experimental Setup
...and 6 more sections

Key Result

Theorem 1

The Optimal Informative Subgraph Retrieval problem is NP-hard.

Figures (4)

Figure 1: Three challenges in black-box Knowledge Graph retrieval that limit recall and precision: (a) Semantic instantiation uncertainty when grounding query concepts into KG entities, risking recall loss; (b) Structural path uncertainty when determining relevant relational chains, risking recall loss; and (c) Evidential comparison uncertainty when ranking candidates based on implicit evidence, risking precision loss
Figure 2: Pipeline of BubbleRAG.
Figure 3: An example of the Candidate Evidence Graph (CEG) generated by Bubble Expansion. Different highlight colors in the query represent different extracted keywords, which are mapped to their corresponding semantic anchor groups in the graph. Elements with red borders indicate the nodes and edges that have been successfully incorporated into a CEG.
Figure 4: An example of Candidate Evidence Graph (CEG) Ranking. Different highlight colors in the query represent different semantic anchor groups, each assigned an importance weight. The final score of a CEG is determined by two components: its average semantic cost and a structural incompleteness penalty derived from missing concept groups. By simply adjusting the hyperparameter $\alpha$, the system can dynamically alter the ranking results to support AND operation and OR operation. The algorithm naturally supports compare queries by selecting the top-$n$ CEGs.

Theorems & Definitions (8)

Example 1
Example 2
definition 1: Optimal Informative Subgraph Retrieval (OISR)
Example 3
Example 4
Example 5
Theorem 1
Theorem 2

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

Abstract

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)