Table of Contents
Fetching ...

RoseRAG: Robust Retrieval-augmented Generation with Small-scale LLMs via Margin-aware Preference Optimization

Tianci Liu, Haoxiang Jiang, Tianze Wang, Ran Xu, Yue Yu, Linjun Zhang, Tuo Zhao, Haoyu Wang

TL;DR

RoseRAG tackles the challenge of making small-scale LLMs robust in retrieval-augmented generation by introducing a margin-aware preference optimization framework. It combines three stages—Preference Data Generation with multi-turn prompting and rejection sampling, Preference Data Selection via contrastive margin maximization, and Preference Optimization with the ORPO loss—to align SLM outputs with high-quality responses without distilling from larger models. Empirically, RoseRAG consistently outperforms state-of-the-art baselines on HotPotQA, 2WikiMultiHopQA, and StrategyQA across multiple small backbones, and demonstrates the critical roles of data selection and rejection sampling in boosting performance. The approach is robust to different retrieval sizes and optimization strategies, offering a practical, scalable path to reliable RAG for resource-constrained deployments.

Abstract

Large language models (LLMs) have achieved impressive performance but face high computational costs and latency, limiting their deployment in resource-constrained settings. In contrast, small-scale LLMs (SLMs) are more efficient yet struggle to capture evolving real-world knowledge. Retrieval-augmented generation (RAG) helps by integrating external knowledge, but imperfect retrieval can introduce distracting noise that misleads SLMs. We propose RoseRAG, a robust RAG framework for SLMs via Margin-aware Preference Optimization. RoseRAG employs multi-turn prompting for detailed reasoning, rejection sampling for high-quality explanations, and contrastive preference selection to refine responses by maximizing the likelihood gap between preferred and non-preferred outputs. By integrating these components into a margin-aware optimization process, RoseRAG robustly enhances the accuracy and reliability of SLMs for RAG applications. Extensive experiments on three open-domain question answering benchmarks indicate that our innovative RoseRAG surpasses state-of-the-art baselines significantly.

RoseRAG: Robust Retrieval-augmented Generation with Small-scale LLMs via Margin-aware Preference Optimization

TL;DR

RoseRAG tackles the challenge of making small-scale LLMs robust in retrieval-augmented generation by introducing a margin-aware preference optimization framework. It combines three stages—Preference Data Generation with multi-turn prompting and rejection sampling, Preference Data Selection via contrastive margin maximization, and Preference Optimization with the ORPO loss—to align SLM outputs with high-quality responses without distilling from larger models. Empirically, RoseRAG consistently outperforms state-of-the-art baselines on HotPotQA, 2WikiMultiHopQA, and StrategyQA across multiple small backbones, and demonstrates the critical roles of data selection and rejection sampling in boosting performance. The approach is robust to different retrieval sizes and optimization strategies, offering a practical, scalable path to reliable RAG for resource-constrained deployments.

Abstract

Large language models (LLMs) have achieved impressive performance but face high computational costs and latency, limiting their deployment in resource-constrained settings. In contrast, small-scale LLMs (SLMs) are more efficient yet struggle to capture evolving real-world knowledge. Retrieval-augmented generation (RAG) helps by integrating external knowledge, but imperfect retrieval can introduce distracting noise that misleads SLMs. We propose RoseRAG, a robust RAG framework for SLMs via Margin-aware Preference Optimization. RoseRAG employs multi-turn prompting for detailed reasoning, rejection sampling for high-quality explanations, and contrastive preference selection to refine responses by maximizing the likelihood gap between preferred and non-preferred outputs. By integrating these components into a margin-aware optimization process, RoseRAG robustly enhances the accuracy and reliability of SLMs for RAG applications. Extensive experiments on three open-domain question answering benchmarks indicate that our innovative RoseRAG surpasses state-of-the-art baselines significantly.

Paper Structure

This paper contains 26 sections, 6 theorems, 28 equations, 10 figures, 4 tables.

Key Result

Lemma 5.1

Under Assumption assump:large space, the solution to optimizing Eqn. eq:orpo is where $Z(x)$ is partition function such that $\sum_y P_\theta(y|x) = 1$.

Figures (10)

  • Figure 1: Pilot studies. Fig. \ref{['fig:ground_truth']}: Ground Truth Documents with varying amounts of noisy documents. Fig. \ref{['fig:retrieved_docs']}: Performance w.r.t. varying numbers of retrieved documents. Both the two sub-figures are results with Qwen2.5-1.5B-Instruct on HotpotQA.
  • Figure 2: Framework of proposed RoseRAG.
  • Figure 3: Overview of the rationale generation process.
  • Figure 4: Accuracy of RoseRAG with and without rejection sampling with Qwen2.5-1.5B-Instruct.
  • Figure 5: Comparison of different experimental settings. Experiments are conducted on HotPotQA with Qwen2.5-1.5B-Instruct as the backbone.
  • ...and 5 more figures

Theorems & Definitions (9)

  • Lemma 5.1
  • Theorem 5.1
  • Lemma C.1
  • Lemma C.2
  • proof
  • Lemma C.3
  • proof
  • Theorem C.1
  • proof