Table of Contents
Fetching ...

OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning

Jiawei Zhou, Lei Chen

TL;DR

OpenRAG tackles the misalignment between IR-derived retriever relevance and RAG performance by learning in-context relevance end-to-end. It combines offline RAG warmup with online in-training retrieval using a semi-parametric disentangled retriever (SiDR) and a contrastive objective to align the retriever with downstream evaluation. Across four benchmarks, OpenRAG yields a 4.0% improvement over the original retriever and 2.1% over state-of-the-art retrievers, with notable gains on PubHealth and potential to surpass some 8B LLM-based approaches in cost-sensitive settings. The results demonstrate that retrieval learning is a potent lever for enhancing RAG systems and can transfer to other LLMs for open-ended generation, albeit with some limitations for closed-set tasks.

Abstract

In this paper, we analyze and empirically show that the learned relevance for conventional information retrieval (IR) scenarios may be inconsistent in retrieval-augmented generation (RAG) scenarios. To bridge this gap, we introduce OpenRAG, a RAG framework that is optimized end-to-end by tuning the retriever to capture in-context relevance, enabling adaptation to the diverse and evolving needs. Extensive experiments across a wide range of tasks demonstrate that OpenRAG, by tuning a retriever end-to-end, leads to a consistent improvement of 4.0% over the original retriever, consistently outperforming existing state-of-the-art retrievers by 2.1%. Additionally, our results indicate that for some tasks, an end-to-end tuned 0.2B retriever can achieve improvements that surpass those of RAG-oriented or instruction-tuned 8B large language models (LLMs), highlighting the cost-effectiveness of our approach in enhancing RAG systems.

OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning

TL;DR

OpenRAG tackles the misalignment between IR-derived retriever relevance and RAG performance by learning in-context relevance end-to-end. It combines offline RAG warmup with online in-training retrieval using a semi-parametric disentangled retriever (SiDR) and a contrastive objective to align the retriever with downstream evaluation. Across four benchmarks, OpenRAG yields a 4.0% improvement over the original retriever and 2.1% over state-of-the-art retrievers, with notable gains on PubHealth and potential to surpass some 8B LLM-based approaches in cost-sensitive settings. The results demonstrate that retrieval learning is a potent lever for enhancing RAG systems and can transfer to other LLMs for open-ended generation, albeit with some limitations for closed-set tasks.

Abstract

In this paper, we analyze and empirically show that the learned relevance for conventional information retrieval (IR) scenarios may be inconsistent in retrieval-augmented generation (RAG) scenarios. To bridge this gap, we introduce OpenRAG, a RAG framework that is optimized end-to-end by tuning the retriever to capture in-context relevance, enabling adaptation to the diverse and evolving needs. Extensive experiments across a wide range of tasks demonstrate that OpenRAG, by tuning a retriever end-to-end, leads to a consistent improvement of 4.0% over the original retriever, consistently outperforming existing state-of-the-art retrievers by 2.1%. Additionally, our results indicate that for some tasks, an end-to-end tuned 0.2B retriever can achieve improvements that surpass those of RAG-oriented or instruction-tuned 8B large language models (LLMs), highlighting the cost-effectiveness of our approach in enhancing RAG systems.

Paper Structure

This paper contains 35 sections, 9 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Comparison of query-document relevance in IR scenario and RAG scenario.
  • Figure 2: Illustration of the Open-Rag training process.
  • Figure 3: Ablation studies on NQ and Pubhealth datasets.
  • Figure 4: Illustration of semi-parametric disentangled retriever (SiDR) framework, adapted from zhou2024semi.
  • Figure 5: RAG accuracy of different in-training retrieval approaches.
  • ...and 7 more figures