FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

Xinping Zhao; Yan Zhong; Zetian Sun; Xinshuo Hu; Zhenyu Liu; Dongfang Li; Baotian Hu; Min Zhang

FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

Xinping Zhao, Yan Zhong, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Dongfang Li, Baotian Hu, Min Zhang

TL;DR

FunnelRAG addresses the inefficiency and ceiling limits of flat retrieval in Retrieval-Augmented Generation by introducing a coarse-to-fine progressive retrieval pipeline that combines large-to-small candidate sets, coarse-to-fine granularity, and mixed-capacity retrievers. The methodology comprises three stages—Retrieval of long coarse units, Pre-ranking of documents within clusters, and Post-ranking of fine-grained passages—augmented by L2G distillation to align signals across stages. Empirical results on Natural Questions and TriviaQA show about a 40% reduction in retrieval time with comparable or improved answer recall, and generation benefits in most settings, especially at tighter cutoffs. The work demonstrates that orchestrating simple to complex retrievers along a progressive, aggregated signal pathway yields substantial efficiency gains while preserving retrieval quality and contextual integrity.”wrap in $...$ where appropriate for any mathematical notation in the text, though the summary primarily emphasizes methodology and results.

Abstract

Retrieval-Augmented Generation (RAG) prevails in Large Language Models. It mainly consists of retrieval and generation. The retrieval modules (a.k.a. retrievers) aim to find useful information used to facilitate the generation modules (a.k.a. generators). As such, generators' performance largely depends on the effectiveness and efficiency of retrievers. However, the widely used retrieval paradigm remains flat. It treats retrieval procedures as a one-off deal with constant granularity. Despite effectiveness, we argue that they suffer from two limitations: (1) flat retrieval exerts a significant burden on one retriever; (2) constant granularity limits the ceiling of retrieval performance. In this work, we propose a progressive retrieval paradigm with coarse-to-fine granularity for RAG, termed FunnelRAG, so as to balance effectiveness and efficiency. Specifically, FunnelRAG establishes a progressive retrieval pipeline by collaborating coarse-to-fine granularity, large-to-small quantity, and low-to-high capacity, which can relieve the burden on one retriever and also promote the ceiling of retrieval performance. Extensive experiments manifest that FunnelRAG achieves comparable retrieval performance while the time overhead is reduced by nearly 40 percent.

FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

TL;DR

Abstract

FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)