Table of Contents
Fetching ...

Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

Guochang Li, Yuchen Liu, Zhen Qin, Yunkun Wang, Jianping Zhong, Chen Zhi, Binhua Li, Fei Huang, Yongbin Li, Shuiguang Deng

TL;DR

The paper tackles the challenge of repository-level QA by proposing RepoSearch-R1, a cold-start reinforcement learning framework that integrates Monte-carlo Tree Search with Group Relative Policy Optimization to generate diverse, high-quality reasoning trajectories without distillation. It introduces RepoQA-Agent with a tailored toolset for repository exploration and a ReAct-inspired multi-turn interaction protocol, and demonstrates strong improvements in QA completeness and training efficiency on the CoReQA benchmark. A dual-reward design combines LLM-based answer quality with intermediate process rewards, enabling effective on-policy learning from scratch. The results show substantial gains for closed-source models and more variable outcomes for open-source models, underscoring the importance of model capacity for tool-augmented reasoning and highlighting the method’s potential for compliant, self-supervised training of repository-aware agents.

Abstract

Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively in tool utilization and decision-making based on environmental feedback, while training-based approaches typically rely on costly distillation from larger LLMs, introducing data compliance concerns in enterprise environments. To address these challenges, we introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search (MCTS). This approach allows agents to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision. Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks. Comprehensive evaluation on repository question-answering tasks demonstrates that RepoSearch-R1 achieves substantial improvements of answer completeness: 16.0% enhancement over no-retrieval methods, 19.5% improvement over iterative retrieval methods, and 33% increase in training efficiency compared to general agentic reinforcement learning approaches. Our cold-start training methodology eliminates data compliance concerns while maintaining robust exploration diversity and answer completeness across repository-level reasoning tasks.

Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

TL;DR

The paper tackles the challenge of repository-level QA by proposing RepoSearch-R1, a cold-start reinforcement learning framework that integrates Monte-carlo Tree Search with Group Relative Policy Optimization to generate diverse, high-quality reasoning trajectories without distillation. It introduces RepoQA-Agent with a tailored toolset for repository exploration and a ReAct-inspired multi-turn interaction protocol, and demonstrates strong improvements in QA completeness and training efficiency on the CoReQA benchmark. A dual-reward design combines LLM-based answer quality with intermediate process rewards, enabling effective on-policy learning from scratch. The results show substantial gains for closed-source models and more variable outcomes for open-source models, underscoring the importance of model capacity for tool-augmented reasoning and highlighting the method’s potential for compliant, self-supervised training of repository-aware agents.

Abstract

Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively in tool utilization and decision-making based on environmental feedback, while training-based approaches typically rely on costly distillation from larger LLMs, introducing data compliance concerns in enterprise environments. To address these challenges, we introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search (MCTS). This approach allows agents to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision. Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks. Comprehensive evaluation on repository question-answering tasks demonstrates that RepoSearch-R1 achieves substantial improvements of answer completeness: 16.0% enhancement over no-retrieval methods, 19.5% improvement over iterative retrieval methods, and 33% increase in training efficiency compared to general agentic reinforcement learning approaches. Our cold-start training methodology eliminates data compliance concerns while maintaining robust exploration diversity and answer completeness across repository-level reasoning tasks.

Paper Structure

This paper contains 42 sections, 4 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: QA paris example in CoReQA: repository question and ground truth answer
  • Figure 2: Example of multi-turn interaction with code repository showing Thought-Action-Observation pattern
  • Figure 3: Overview of the RepoSearch-R1 framework showing the three-stage training pipeline: (1) MCTS-guided rollout generates diverse exploration trajectories through UCT selection, expansion, simulation, and backpropagation; (2) Trajectory selection and reward computation evaluates trajectories using reward function combining answer quality and process efficiency; (3) Advantage computation and GRPO training updates the policy using group-based normalization. The framework enables self-training agentic reinforcement learning for repository-level question answering without external supervision.
  • Figure 4: Self-Critique Reflection Prompts for Different Exploration States
  • Figure 5: LLM Judge Template for Answer Quality Assessment
  • ...and 1 more figures