Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

Guochang Li; Yuchen Liu; Zhen Qin; Yunkun Wang; Jianping Zhong; Chen Zhi; Binhua Li; Fei Huang; Yongbin Li; Shuiguang Deng

Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

Guochang Li, Yuchen Liu, Zhen Qin, Yunkun Wang, Jianping Zhong, Chen Zhi, Binhua Li, Fei Huang, Yongbin Li, Shuiguang Deng

TL;DR

The paper tackles the challenge of repository-level QA by proposing RepoSearch-R1, a cold-start reinforcement learning framework that integrates Monte-carlo Tree Search with Group Relative Policy Optimization to generate diverse, high-quality reasoning trajectories without distillation. It introduces RepoQA-Agent with a tailored toolset for repository exploration and a ReAct-inspired multi-turn interaction protocol, and demonstrates strong improvements in QA completeness and training efficiency on the CoReQA benchmark. A dual-reward design combines LLM-based answer quality with intermediate process rewards, enabling effective on-policy learning from scratch. The results show substantial gains for closed-source models and more variable outcomes for open-source models, underscoring the importance of model capacity for tool-augmented reasoning and highlighting the method’s potential for compliant, self-supervised training of repository-aware agents.

Abstract

Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively in tool utilization and decision-making based on environmental feedback, while training-based approaches typically rely on costly distillation from larger LLMs, introducing data compliance concerns in enterprise environments. To address these challenges, we introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search (MCTS). This approach allows agents to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision. Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks. Comprehensive evaluation on repository question-answering tasks demonstrates that RepoSearch-R1 achieves substantial improvements of answer completeness: 16.0% enhancement over no-retrieval methods, 19.5% improvement over iterative retrieval methods, and 33% increase in training efficiency compared to general agentic reinforcement learning approaches. Our cold-start training methodology eliminates data compliance concerns while maintaining robust exploration diversity and answer completeness across repository-level reasoning tasks.

Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

TL;DR

Abstract

Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)