Table of Contents
Fetching ...

BridgeRAG: Training-Free Bridge-Conditioned Retrieval for Multi-Hop Question Answering

Andre Bacellar

Abstract

Multi-hop retrieval is not a single-step relevance problem: later-hop evidence should be ranked by its utility conditioned on retrieved bridge evidence, not by similarity to the original query alone. We present BridgeRAG, a training-free, graph-free retrieval method for retrieval-augmented generation (RAG) over multi-hop questions that operationalizes this view with a tripartite scorer s(q,b,c) over (question, bridge, candidate). BridgeRAG separates coverage from scoring: dual-entity ANN expansion broadens the second-hop candidate pool, while a bridge-conditioned LLM judge identifies the active reasoning chain among competing candidates without any offline graph or proposition index. Across four controlled experiments we show that this conditioning signal is (i) selective: +2.55pp on parallel-chain queries (p<0.001) vs. ~0 on single-chain subtypes; (ii) irreplaceable: substituting the retrieved passage with generated SVO query text reduces R@5 by 2.1pp, performing worse than even the lowest-SVO-similarity pool passage; (iii) predictable: cos(b,g2) correlates with per-query gain (Spearman rho=0.104, p<0.001); and (iv) mechanistically precise: bridge conditioning causes productive re-rankings (18.7% flip-win rate on parallel-chain vs. 0.6% on single-chain), not merely more churn. Combined with lightweight coverage expansion and percentile-rank score fusion, BridgeRAG achieves the best published training-free R@5 under matched benchmark evaluation on all three standard MHQA benchmarks without a graph database or any training: 0.8146 on MuSiQue (+3.1pp vs. PropRAG, +6.8pp vs. HippoRAG2), 0.9527 on 2WikiMultiHopQA (+1.2pp vs. PropRAG), and 0.9875 on HotpotQA (+1.35pp vs. PropRAG).

BridgeRAG: Training-Free Bridge-Conditioned Retrieval for Multi-Hop Question Answering

Abstract

Multi-hop retrieval is not a single-step relevance problem: later-hop evidence should be ranked by its utility conditioned on retrieved bridge evidence, not by similarity to the original query alone. We present BridgeRAG, a training-free, graph-free retrieval method for retrieval-augmented generation (RAG) over multi-hop questions that operationalizes this view with a tripartite scorer s(q,b,c) over (question, bridge, candidate). BridgeRAG separates coverage from scoring: dual-entity ANN expansion broadens the second-hop candidate pool, while a bridge-conditioned LLM judge identifies the active reasoning chain among competing candidates without any offline graph or proposition index. Across four controlled experiments we show that this conditioning signal is (i) selective: +2.55pp on parallel-chain queries (p<0.001) vs. ~0 on single-chain subtypes; (ii) irreplaceable: substituting the retrieved passage with generated SVO query text reduces R@5 by 2.1pp, performing worse than even the lowest-SVO-similarity pool passage; (iii) predictable: cos(b,g2) correlates with per-query gain (Spearman rho=0.104, p<0.001); and (iv) mechanistically precise: bridge conditioning causes productive re-rankings (18.7% flip-win rate on parallel-chain vs. 0.6% on single-chain), not merely more churn. Combined with lightweight coverage expansion and percentile-rank score fusion, BridgeRAG achieves the best published training-free R@5 under matched benchmark evaluation on all three standard MHQA benchmarks without a graph database or any training: 0.8146 on MuSiQue (+3.1pp vs. PropRAG, +6.8pp vs. HippoRAG2), 0.9527 on 2WikiMultiHopQA (+1.2pp vs. PropRAG), and 0.9875 on HotpotQA (+1.35pp vs. PropRAG).

Paper Structure

This paper contains 48 sections, 6 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Bridge conditioning resolves chain ambiguity. The 2-way judge promotes a passage about the Terminator film (same entity surface, wrong chain). Conditioning on bridge $b$ (Schwarzenegger passage) allows the judge to identify Maria Shriver as the correct second-hop target.
  • Figure 2: BridgeRAG pipeline. Hop 1 (left): query $q$ is embedded with NV-Embed-v2 and retrieved via ANN; the top-1 passage becomes bridge $b$. Entity branch (centre): a Llama 3.3 70B call extracts entities $e_1$, $e_2$ from $b$; each is used for an independent ANN retrieval (top-5), yielding entity-grounded candidates. SVO branch (right):$q$ and $b$ condition a second Llama call that generates $N{=}3$ SVO queries; each is embedded and retrieved ($3{\times}$ANN, union_max), yielding SVO-15 candidates. Pool: SVO-15 $\cup$$e_1$-5 $\cup$$e_2$-5 $\to$ top-20. Judge: a tripartite judge scores every $c_i$ via $s(q,b,e_1,e_2,c_i)$; scores are PIT-fused ($\alpha{=}0.1$) to produce the final top-5.