Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the Unknown
Cedric Derstroff, Jannis Brugger, Jannis Blüml, Mira Mezini, Stefan Kramer, Kristian Kersting
TL;DR
This work tackles the inefficiency of Monte-Carlo Tree Search (MCTS) in large trees where the algorithm revisits already explored regions. It introduces AmEx-MCTS, a decoupled formulation that separates value updates, visit counts, and the chosen path, and utilizes not-completely-explored-subtrees (nces) along with action selectors $a_{max}$ and $a_{select}$ to ignore fully explored regions while preserving MCTS principles; a variant AmÆx-MCTS further replaces the mean with a max in the UCT update. Theoretical analysis shows convergence to exhaustive search in the limit and preservation of UCT guarantees, while empirical results on three deterministic single-player domains (Chain, ChainLoop, and deterministic FrozenLake) demonstrate substantially broader search coverage and superior performance over classical MCTS and MCTS-T. These findings indicate significant efficiency gains for large-scale planning and single-player decision problems, suggesting practical impact for real-time and complex problem solving. The work also lays a foundation for future integration with neural components and end-to-end planning approaches in domains such as game endgames, chemistry, and materials design.
Abstract
Monte-Carlo tree search (MCTS) is an effective anytime algorithm with a vast amount of applications. It strategically allocates computational resources to focus on promising segments of the search tree, making it a very attractive search algorithm in large search spaces. However, it often expends its limited resources on reevaluating previously explored regions when they remain the most promising path. Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation. Central to AmEx-MCTS is the decoupling of value updates, visit count updates, and the selected path during the tree search, thereby enabling the exclusion of already explored subtrees or leaves. This segregation preserves the utility of visit counts for both exploration-exploitation balancing and quality metrics within MCTS. The resultant augmentation facilitates in a considerably broader search using identical computational resources, preserving the essential characteristics of MCTS. The expanded coverage not only yields more precise estimations but also proves instrumental in larger and more complex problems. Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
