PropRAG: Guiding Retrieval with Beam Search over Proposition Paths
Jingjin Wang, Jiawei Han
TL;DR
PropRAG tackles the limitations of traditional RAG by adopting context-rich propositions as knowledge units and introducing an offline proposition graph with an online, LLM-free beam search to discover multi-hop reasoning paths. The framework couples a two-stage retrieval strategy—Stage 1 coarse subgraph induction via PPR and Stage 2 beam-search path discovery and ranking—to efficiently assemble coherent evidence chains without online LLM calls. Empirical results on MuSiQue, 2Wiki, and HotpotQA show state-of-the-art zero-shot Recall@5 and F1, with ablations confirming the value of propositions, graph guidance, and beam search. Efficiency analysis indicates a favorable offline-online trade-off: higher upfront proposition extraction costs yield significantly better retrieval quality while avoiding costly online LLM inference during retrieval, enabling practical multi-hop evidence gathering for LLMs.
Abstract
Retrieval Augmented Generation (RAG) has become the standard approach for equipping Large Language Models (LLMs) with up-to-date knowledge. However, standard RAG, relying on independent passage retrieval, often fails to capture the interconnected nature of information required for complex, multi-hop reasoning. While structured RAG methods attempt to address this using knowledge graphs built from triples, we argue that the inherent context loss of triples (context collapse) limits the fidelity of the knowledge representation. We introduce PropRAG, a novel RAG framework that shifts from triples to context-rich propositions and introduces an efficient, LLM-free online beam search over proposition paths to discover multi-step reasoning chains. By coupling a higher-fidelity knowledge representation with explicit path discovery, PropRAG achieves state-of-the-art zero-shot Recall@5 and F1 scores on 2Wiki, HotpotQA, and MuSiQue, advancing non-parametric knowledge integration by improving evidence retrieval through richer representation and efficient reasoning path discovery.
