Table of Contents
Fetching ...

Doctor-RAG: Failure-Aware Repair for Agentic Retrieval-Augmented Generation

Shuguang Jiao, Chengkai Huang, Shuhan Qi, Xuan Wang, Yifan Li, Lina Yao

Abstract

Agentic Retrieval-Augmented Generation (Agentic RAG) has become a widely adopted paradigm for multi-hop question answering and complex knowledge reasoning, where retrieval and reasoning are interleaved at inference time. As reasoning trajectories grow longer, failures become increasingly common. Existing approaches typically address such failures by either stopping at diagnostic analysis or rerunning the entire retrieval-reasoning pipeline, which leads to substantial computational overhead and redundant reasoning. In this paper, we propose Doctor-RAG (DR-RAG), a unified diagnose-and-repair framework that corrects failures in Agentic RAG through explicit error localization and prefix reuse, enabling minimal-cost intervention. DR-RAG decomposes failure handling into two consecutive stages: (i) trajectory-level failure diagnosis and localization, which attributes errors to a coverage-gated taxonomy and identifies the earliest failure point in the reasoning trajectory; and (ii) tool-conditioned local repair, which intervenes only at the diagnosed failure point while maximally reusing validated reasoning prefixes and retrieved evidence. By explicitly separating error attribution from correction, DR-RAG enables precise error localization, thereby avoiding expensive full-pipeline reruns and enabling targeted, efficient repair. We evaluate DR-RAG across three multi-hop question answering benchmarks, multiple agentic RAG baselines, and different backbone models. Experimental results demonstrate that DR-RAG substantially improves answer accuracy while significantly reducing reasoning token consumption compared to rerun-based repair strategies.

Doctor-RAG: Failure-Aware Repair for Agentic Retrieval-Augmented Generation

Abstract

Agentic Retrieval-Augmented Generation (Agentic RAG) has become a widely adopted paradigm for multi-hop question answering and complex knowledge reasoning, where retrieval and reasoning are interleaved at inference time. As reasoning trajectories grow longer, failures become increasingly common. Existing approaches typically address such failures by either stopping at diagnostic analysis or rerunning the entire retrieval-reasoning pipeline, which leads to substantial computational overhead and redundant reasoning. In this paper, we propose Doctor-RAG (DR-RAG), a unified diagnose-and-repair framework that corrects failures in Agentic RAG through explicit error localization and prefix reuse, enabling minimal-cost intervention. DR-RAG decomposes failure handling into two consecutive stages: (i) trajectory-level failure diagnosis and localization, which attributes errors to a coverage-gated taxonomy and identifies the earliest failure point in the reasoning trajectory; and (ii) tool-conditioned local repair, which intervenes only at the diagnosed failure point while maximally reusing validated reasoning prefixes and retrieved evidence. By explicitly separating error attribution from correction, DR-RAG enables precise error localization, thereby avoiding expensive full-pipeline reruns and enabling targeted, efficient repair. We evaluate DR-RAG across three multi-hop question answering benchmarks, multiple agentic RAG baselines, and different backbone models. Experimental results demonstrate that DR-RAG substantially improves answer accuracy while significantly reducing reasoning token consumption compared to rerun-based repair strategies.

Paper Structure

This paper contains 37 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The statistics of different error types in Agentic RAG under the ReAct baseline on HotpotQA (500 samples).
  • Figure 2: Overview of DR-RAG. Given a failed agentic RAG trajectory, the framework performs taxonomy-constrained diagnosis and localization, followed by tool-conditioned local repair that intervenes at the failure point while reusing validated prefixes.
  • Figure 3: Accuracy of the automated diagnosis module evaluated against human annotations across different datasets and baselines.
  • Figure 4: Automated error diagnosis and repair analysis. (a) Confusion matrix aggregated across multiple datasets (2Wiki, HotpotQA, and MuSiQue) and baselines (ReAct, Search-o1, and Search-R1), showing high consistency between the diagnostic model and human annotations. (b) Comparison of average repair rates across error types on HotpotQA, aggregating results from three baselines under repair strategies.
  • Figure 5: Case study of a reasoning logic error under full evidence coverage. DR-RAG localizes the earliest faulty reasoning step and repairs the error by reusing retrieved evidence, avoiding unnecessary retrieval and full trajectory regeneration.