Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models

Shayekh Bin Islam; Md Asib Rahman; K S M Tozammel Hossain; Enamul Hoque; Shafiq Joty; Md Rizwan Parvez

Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models

Shayekh Bin Islam, Md Asib Rahman, K S M Tozammel Hossain, Enamul Hoque, Shafiq Joty, Md Rizwan Parvez

TL;DR

Open-RAG addresses the limited reasoning of retrieval-augmented generation with open-source LLMs by transforming a dense model into a parameter-efficient sparse MoE trained with contrastive signals to navigate distractors. It introduces reflection tokens (Retrieval, Relevance, Grounding, Utility) and a hybrid on-demand retrieval mechanism to balance accuracy and speed, enabling effective single- and multi-hop reasoning. Empirical results across eight knowledge-intensive benchmarks show strong gains over open baselines and competitive performance against proprietary RAG systems, with the 13B+MoE variant achieving top results on several multi-hop tasks. The work provides a practical, open-source approach to high-fidelity knowledge reasoning in RAG while outlining memory and domain-extension considerations for future improvement.

Abstract

Retrieval-Augmented Generation (RAG) has been shown to enhance the factual accuracy of Large Language Models (LLMs), but existing methods often suffer from limited reasoning capabilities in effectively using the retrieved evidence, particularly when using open-source LLMs. To mitigate this gap, we introduce a novel framework, Open-RAG, designed to enhance reasoning capabilities in RAG with open-source LLMs. Our framework transforms an arbitrary dense LLM into a parameter-efficient sparse mixture of experts (MoE) model capable of handling complex reasoning tasks, including both single- and multi-hop queries. Open-RAG uniquely trains the model to navigate challenging distractors that appear relevant but are misleading. As a result, Open-RAG leverages latent learning, dynamically selecting relevant experts and integrating external knowledge effectively for more accurate and contextually relevant responses. In addition, we propose a hybrid adaptive retrieval method to determine retrieval necessity and balance the trade-off between performance gain and inference speed. Experimental results show that the Llama2-7B-based Open-RAG outperforms state-of-the-art LLMs and RAG models such as ChatGPT, Self-RAG, and Command R+ in various knowledge-intensive tasks. We open-source our code and models at https://openragmoe.github.io/

Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models

TL;DR

Abstract

Paper Structure (23 sections, 4 equations, 7 figures, 5 tables)

This paper contains 23 sections, 4 equations, 7 figures, 5 tables.

Introduction
Open-RAG: Enhanced Retrieval-Augmented Reasoning
Overview
Open-RAG Training
Data Collection
Parameter-Efficient MoE Finetuning
Hybrid Approach for Adaptive Retrieval
Experiments
Tasks and Datasets
Experimental settings
Baselines
Results and Analysis
Main Results
Performance-Speed by Adaptive Retrieval
Ablation Studies
...and 8 more sections

Figures (7)

Figure 1: Inference pipeline in our framework, Open-RAG. It learns to generate retrieval/no_retrieval tokens, contrasts between relevant and irrelevant contexts, and categorizes answers as partially, fully, or not supported. Then at inference, given a (multi-hop) user query, we first enforce the model to generate an answer with conditional to no_retrieval as input, and based on the model confidence we dynamically determine if retrieval is needed.
Figure 2: Open-RAG training data preparation involves generating four variations of new training instances from each original pair ($q$, $y$), each incorporating different reflection tokens using ground truth/LLM critic and retrieved passages. Our approach enables an LLM not only to reflect on generation quality but also to contrast distractors.
Figure 3: Architechture transformation (dense to PEFT MoE) in Open-RAG. Router $\mathcal{R}$ is trained from scratch. The FFN layer is kept frozen and adapted by parallel-adapter-based experts $\mathbf{E}$. Other layers are being copied.
Figure 4: (Top) Performance vs Retrieval by different adaptive retrieval strategies. (Bottom) Performance vs scores from adaptive retrieval. $f_{ret}$ denotes probability score from external model distilled/predicted reflection token.
Figure 5: Model performances utilizing CRAG contexts
...and 2 more figures

Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models

TL;DR

Abstract

Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)