Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning
Masataro Asai
TL;DR
The paper addresses the bottleneck of node selection in bandit-based Monte-Carlo Tree Search for classical planning by introducing Bilevel MCTS, which performs a budgeted best-first search from the selected leaf to achieve amortized $O(1)$ selection when using an array-based priority queue. It strengthens this core idea with Tree Collapsing to curtail tree depth and pairs these with orthogonal techniques (novelty BFWS, boosted preferred operators, and alternation queues) to form Nbula, a configuration that outperforms several state-of-the-art planners on Agile IPC benchmarks, especially in deeper searches. Extended 30-minute runs further demonstrate Nbula’s superior coverage, though memory usage presents a practical constraint. Overall, the work advances scalable, statistically guided planning by balancing high-level MCTS decisions with low-level, fast data-structure-driven node management.
Abstract
We study an efficient implementation of Multi-Armed Bandit (MAB)-based Monte-Carlo Tree Search (MCTS) for classical planning. One weakness of MCTS is that it spends a significant time deciding which node to expand next. While selecting a node from an OPEN list with $N$ nodes has $O(1)$ runtime complexity with traditional array-based priority-queues for dense integer keys, the tree-based OPEN list used by MCTS requires $O(\log N)$, which roughly corresponds to the search depth $d$. In classical planning, $d$ is arbitrarily large (e.g., $2^k-1$ in $k$-disk Tower-of-Hanoi) and the runtime for node selection is significant, unlike in game tree search, where the cost is negligible compared to the node evaluation (rollouts) because $d$ is inherently limited by the game (e.g., $d\leq 361$ in Go). To improve this bottleneck, we propose a bilevel modification to MCTS that runs a best-first search from each selected leaf node with an expansion budget proportional to $d$, which achieves amortized $O(1)$ runtime for node selection, equivalent to the traditional queue-based OPEN list. In addition, we introduce Tree Collapsing, an enhancement that reduces action selection steps and further improves the performance.
