Table of Contents
Fetching ...

DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision

Yongqi Leng, Yikun Lei, Xikai Liu, Meizhi Zhong, Bojian Xiong, Yurong Zhang, Yan Gao, Yi Wu, Yao Hu, Deyi Xiong

TL;DR

DecEx-RAG tackles the inefficiencies of outcome-supervised reinforcement learning in Agentic Retrieval-Augmented Generation by modeling RAG as a two-stage Markov Decision Process with explicit Decision-Making and Execution components. It introduces a pruning-based data expansion method for the search tree, enabling fine-grained process supervision via SFT and DPO. Across six open-domain QA datasets, DecEx-RAG achieves about $6.3\%$ absolute improvements and up to $6\times$ faster data construction, demonstrating strong data efficiency and cross-domain generalization. While effective, the work notes limitations in intermediate reward signals and suggests future evaluation metric developments to better reflect process correctness.

Abstract

Agentic Retrieval-Augmented Generation (Agentic RAG) enhances the processing capability for complex tasks through dynamic retrieval and adaptive workflows. Recent advances (e.g., Search-R1) have shown that outcome-supervised reinforcement learning demonstrate strong performance. However, this approach still suffers from inefficient exploration, sparse reward signals, and ambiguous global reward feedback. To address these challenges, we propose DecEx-RAG, which models RAG as a Markov Decision Process (MDP) incorporating decision-making and execution, while introducing an efficient pruning strategy to optimize data expansion. Through comprehensive process-level policy optimization, DecEx-RAG significantly enhances the autonomous task decomposition, dynamic retrieval, and high-quality answer generation capabilities of large language models (LLMs). Experiments show that DecEx-RAG achieves an average absolute performance improvement of $6.2\%$ across six datasets, significantly outperforming existing baselines. Moreover, the pruning strategy improves data construction efficiency by nearly $6 \times$, providing an efficient solution for process-supervised RAG training. The code is available at https://github.com/sdsxdxl/DecEx-RAG.

DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision

TL;DR

DecEx-RAG tackles the inefficiencies of outcome-supervised reinforcement learning in Agentic Retrieval-Augmented Generation by modeling RAG as a two-stage Markov Decision Process with explicit Decision-Making and Execution components. It introduces a pruning-based data expansion method for the search tree, enabling fine-grained process supervision via SFT and DPO. Across six open-domain QA datasets, DecEx-RAG achieves about absolute improvements and up to faster data construction, demonstrating strong data efficiency and cross-domain generalization. While effective, the work notes limitations in intermediate reward signals and suggests future evaluation metric developments to better reflect process correctness.

Abstract

Agentic Retrieval-Augmented Generation (Agentic RAG) enhances the processing capability for complex tasks through dynamic retrieval and adaptive workflows. Recent advances (e.g., Search-R1) have shown that outcome-supervised reinforcement learning demonstrate strong performance. However, this approach still suffers from inefficient exploration, sparse reward signals, and ambiguous global reward feedback. To address these challenges, we propose DecEx-RAG, which models RAG as a Markov Decision Process (MDP) incorporating decision-making and execution, while introducing an efficient pruning strategy to optimize data expansion. Through comprehensive process-level policy optimization, DecEx-RAG significantly enhances the autonomous task decomposition, dynamic retrieval, and high-quality answer generation capabilities of large language models (LLMs). Experiments show that DecEx-RAG achieves an average absolute performance improvement of across six datasets, significantly outperforming existing baselines. Moreover, the pruning strategy improves data construction efficiency by nearly , providing an efficient solution for process-supervised RAG training. The code is available at https://github.com/sdsxdxl/DecEx-RAG.

Paper Structure

This paper contains 31 sections, 2 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Illustration of the framework for DecEx-RAG, which demonstrates the process of search tree expansion and pruning.
  • Figure 2: Ablation results for SFT (a), DPO (b) and different training methods (c).