TSAPR: A Tree Search Framework For Automated Program Repair
Haichuan Hu, Ye Shang, Weifeng Sun, Quanjun Zhang
TL;DR
TSAPR presents a novel tree-search framework for LLM-based automated program repair by integrating Monte Carlo Tree Search with CoT and self-reflection in an evaluate-and-improve loop. The approach builds a Patch Tree to explore multiple repair trajectories, using UCT for selection and two evaluation strategies (LLM-as-Judge and Test-as-Judge) to assign patch quality, with Q-value backpropagation to guide search. Empirical results across Defects4J, QuixBugs, ConDefects, SWE-Bench-Lite, and VUL4J show TSAPR achieves state-of-the-art repair performance and notable cost efficiency, repairing 201/835 bugs on Defects4J, 27/79 vulnerabilities, and 164/300 SWE-Bench-Lite issues, while being compatible with multiple languages and LLMs. The work demonstrates significant practical impact by enhancing repair effectiveness and efficiency, particularly for complex or multi-step bugs, and provides open-source code to enable broader adoption and reproducibility.
Abstract
With the rapid advancement of Large Language Models (LLMs), traditional Automated Program Repair (APR) techniques have undergone significant transformation. Training-free approaches, such as zero-shot and few-shot prompting, are increasingly favored over fine-tuning-based methods, leveraging the strong code understanding and generation capabilities of LLMs to improve repair effectiveness. However, most existing LLM-based APR systems still follow a trial-and-error paradigm, which faces two fundamental challenges: (1) limited patch quality due to myopic, local exploration; and (2) inefficient search processes caused by redundant or unguided patch generation. To address these limitations, we propose TSAPR, a Tree Search-based APR framework designed for diverse types of software defects. Unlike conventional approaches, TSAPR adopts an evaluate-and-improve paradigm that systematically guides the repair process. Specifically, it integrates Monte Carlo Tree Search (MCTS) into patch exploration, enabling global assessment of candidate patches and prioritizing the most promising ones for iterative refinement and generation. By supporting long-trajectory, multi-path exploration, TSAPR significantly enhances search efficiency while maintaining high flexibility and generality. This design makes it applicable to a wide range of defect types and compatible with various base LLMs. We evaluate TSAPR across five widely used bug and vulnerability benchmarks. Experimental results show that TSAPR successfully repairs 201 out of 835 bugs in Defects4J, outperforming all state-of-the-art baselines. TSAPR also fixes 27 of the 79 vulnerabilities in VUL4J and resolves 164 out of 300 issues in SWE-Bench-Lite, demonstrating its broad effectiveness across different defect categories and real-world development scenarios.
