Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search
Guochang Li, Yuchen Liu, Zhen Qin, Yunkun Wang, Jianping Zhong, Chen Zhi, Binhua Li, Fei Huang, Yongbin Li, Shuiguang Deng
TL;DR
The paper tackles the challenge of repository-level QA by proposing RepoSearch-R1, a cold-start reinforcement learning framework that integrates Monte-carlo Tree Search with Group Relative Policy Optimization to generate diverse, high-quality reasoning trajectories without distillation. It introduces RepoQA-Agent with a tailored toolset for repository exploration and a ReAct-inspired multi-turn interaction protocol, and demonstrates strong improvements in QA completeness and training efficiency on the CoReQA benchmark. A dual-reward design combines LLM-based answer quality with intermediate process rewards, enabling effective on-policy learning from scratch. The results show substantial gains for closed-source models and more variable outcomes for open-source models, underscoring the importance of model capacity for tool-augmented reasoning and highlighting the method’s potential for compliant, self-supervised training of repository-aware agents.
Abstract
Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively in tool utilization and decision-making based on environmental feedback, while training-based approaches typically rely on costly distillation from larger LLMs, introducing data compliance concerns in enterprise environments. To address these challenges, we introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search (MCTS). This approach allows agents to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision. Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks. Comprehensive evaluation on repository question-answering tasks demonstrates that RepoSearch-R1 achieves substantial improvements of answer completeness: 16.0% enhancement over no-retrieval methods, 19.5% improvement over iterative retrieval methods, and 33% increase in training efficiency compared to general agentic reinforcement learning approaches. Our cold-start training methodology eliminates data compliance concerns while maintaining robust exploration diversity and answer completeness across repository-level reasoning tasks.
