Table of Contents
Fetching ...

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

Kun Chen, Qingchao Kong, Zhao Feifei, Wenji Mao

Abstract

Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient for accurate reasoning and problem solving. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches significantly improve problem-solving performance, they are still faced with challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL) process, leading to inaccurate retrieval results and performance degradation. To address these issues, in this paper, we proposes APEX-Searcher, a novel Agentic Planning and Execution framework to augment LLM search capabilities. Specifically, we introduce a two-stage agentic framework that decouples the retrieval process into planning and execution: It first employs RL with decomposition-specific rewards to optimize strategic planning; Built on the sub-task decomposition, it then applies supervised fine-tuning on high-quality multi-hop trajectories to equip the model with robust iterative sub-task execution capabilities. Extensive experiments demonstrate that our proposed framework achieves significant improvements in both multi-hop RAG and task planning performances across multiple benchmarks.

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

Abstract

Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient for accurate reasoning and problem solving. To enhance search capabilities for complex tasks, most existing works integrate multi-round iterative retrieval with reasoning processes via end-to-end training. While these approaches significantly improve problem-solving performance, they are still faced with challenges in task reasoning and model training, especially ambiguous retrieval execution paths and sparse rewards in end-to-end reinforcement learning (RL) process, leading to inaccurate retrieval results and performance degradation. To address these issues, in this paper, we proposes APEX-Searcher, a novel Agentic Planning and Execution framework to augment LLM search capabilities. Specifically, we introduce a two-stage agentic framework that decouples the retrieval process into planning and execution: It first employs RL with decomposition-specific rewards to optimize strategic planning; Built on the sub-task decomposition, it then applies supervised fine-tuning on high-quality multi-hop trajectories to equip the model with robust iterative sub-task execution capabilities. Extensive experiments demonstrate that our proposed framework achieves significant improvements in both multi-hop RAG and task planning performances across multiple benchmarks.
Paper Structure (26 sections, 7 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 26 sections, 7 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of the APEX-Searcher framework. The architecture utilizes RL-driven agentic planning in Stage I to decompose a complex question into a multi-step plan. Subsequently, Stage II employs SFT-guided execution to solve each sub-question using an iterative retrieval loop that features dynamic continuation decisions and context management for final synthesis. See Figure \ref{['fig:picture002']} for an example.
  • Figure 2: An example of two stage walkthrough of the APEX-Searcher pipeline, demonstrating the planning and execution processes on a sample complex question.
  • Figure 3: Parameter Sensitivity Analysis on APEX-Searcher. The curves illustrate the impact of the number of retrieved documents (# Doc) and the maximum allowed reasoning hops (# Hop) on model accuracy across four benchmarks. The asterisk ($\star$) denotes the optimal parameter configuration selected for this study and its corresponding performance.
  • Figure 4: (a) shows the Plan score improvements over base models, (b-c) show the reward score convergence during training for the 3B and 7B variants, and (d) shows the optimization of response length over training steps.
  • Figure 5: Comparison of question decomposition strategies.