Table of Contents
Fetching ...

Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning

Masataro Asai

TL;DR

The paper addresses the bottleneck of node selection in bandit-based Monte-Carlo Tree Search for classical planning by introducing Bilevel MCTS, which performs a budgeted best-first search from the selected leaf to achieve amortized $O(1)$ selection when using an array-based priority queue. It strengthens this core idea with Tree Collapsing to curtail tree depth and pairs these with orthogonal techniques (novelty BFWS, boosted preferred operators, and alternation queues) to form Nbula, a configuration that outperforms several state-of-the-art planners on Agile IPC benchmarks, especially in deeper searches. Extended 30-minute runs further demonstrate Nbula’s superior coverage, though memory usage presents a practical constraint. Overall, the work advances scalable, statistically guided planning by balancing high-level MCTS decisions with low-level, fast data-structure-driven node management.

Abstract

We study an efficient implementation of Multi-Armed Bandit (MAB)-based Monte-Carlo Tree Search (MCTS) for classical planning. One weakness of MCTS is that it spends a significant time deciding which node to expand next. While selecting a node from an OPEN list with $N$ nodes has $O(1)$ runtime complexity with traditional array-based priority-queues for dense integer keys, the tree-based OPEN list used by MCTS requires $O(\log N)$, which roughly corresponds to the search depth $d$. In classical planning, $d$ is arbitrarily large (e.g., $2^k-1$ in $k$-disk Tower-of-Hanoi) and the runtime for node selection is significant, unlike in game tree search, where the cost is negligible compared to the node evaluation (rollouts) because $d$ is inherently limited by the game (e.g., $d\leq 361$ in Go). To improve this bottleneck, we propose a bilevel modification to MCTS that runs a best-first search from each selected leaf node with an expansion budget proportional to $d$, which achieves amortized $O(1)$ runtime for node selection, equivalent to the traditional queue-based OPEN list. In addition, we introduce Tree Collapsing, an enhancement that reduces action selection steps and further improves the performance.

Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning

TL;DR

The paper addresses the bottleneck of node selection in bandit-based Monte-Carlo Tree Search for classical planning by introducing Bilevel MCTS, which performs a budgeted best-first search from the selected leaf to achieve amortized selection when using an array-based priority queue. It strengthens this core idea with Tree Collapsing to curtail tree depth and pairs these with orthogonal techniques (novelty BFWS, boosted preferred operators, and alternation queues) to form Nbula, a configuration that outperforms several state-of-the-art planners on Agile IPC benchmarks, especially in deeper searches. Extended 30-minute runs further demonstrate Nbula’s superior coverage, though memory usage presents a practical constraint. Overall, the work advances scalable, statistically guided planning by balancing high-level MCTS decisions with low-level, fast data-structure-driven node management.

Abstract

We study an efficient implementation of Multi-Armed Bandit (MAB)-based Monte-Carlo Tree Search (MCTS) for classical planning. One weakness of MCTS is that it spends a significant time deciding which node to expand next. While selecting a node from an OPEN list with nodes has runtime complexity with traditional array-based priority-queues for dense integer keys, the tree-based OPEN list used by MCTS requires , which roughly corresponds to the search depth . In classical planning, is arbitrarily large (e.g., in -disk Tower-of-Hanoi) and the runtime for node selection is significant, unlike in game tree search, where the cost is negligible compared to the node evaluation (rollouts) because is inherently limited by the game (e.g., in Go). To improve this bottleneck, we propose a bilevel modification to MCTS that runs a best-first search from each selected leaf node with an expansion budget proportional to , which achieves amortized runtime for node selection, equivalent to the traditional queue-based OPEN list. In addition, we introduce Tree Collapsing, an enhancement that reduces action selection steps and further improves the performance.

Paper Structure

This paper contains 25 sections, 5 theorems, 5 equations, 8 figures, 12 tables, 3 algorithms.

Key Result

Theorem 1

Assume that the search space forms a tree with a constant branching factor $B$, and we have a tree-based OPEN list of depth $D$, containing $N = B^D$ leaves. The runtime of selection step is $BD = O(\log N)$.

Figures (8)

  • Figure 1: MCTS vs. Bilevel MCTS.
  • Figure 2: Left: Comparing the number of node evaluations per second on IPC instances solved by both GBFS ($x$-axis) vs. GUCTN2 ($y$-axis) within the limit, using $h\mathrm{FF}\xspace \text{FF}\xspace$ heuristics. The points below the diagonal indicate that GUCTN2 has significantly slower node evaluations. Middle, Right: Log-log plots comparing the number of node evaluations per second ($y$-axis) versus the average depth of the nodes evaluated during the search ($x$-axis) for $h\mathrm{FF}\xspace \text{FF}\xspace$. GUCTN2 (middle) shows that the search becomes slower as it goes deeper. Bilevel GUCTN2 (right) explores deeper yet shows less degradation in the node/sec. The effect is pronounced in termes, ricochet, rubiks, data-network.
  • Figure 3: Example illustrating Tree Collapsing with $\theta=10$.
  • Figure 4: A histogram plot of the number of IPC18+IPC23 instances solved within 30 minutes. The lines indicate the average of 5 seeds, while the bands indicate the maximum and the minimum among the seeds.
  • Figure 5: ($h\mathrm{FF}\xspace \text{FF}\xspace , h\mathrm{CG}\xspace \text{CG}\xspace , h\mathrm{CEA}\xspace \text{CEA}\xspace$ results in order) Comparing the number of node evaluations per second on IPC instances solved by both GBFS ($x$-axis) vs. GUCTN2 ($y$-axis) within the limit. The points below the diagonal indicate that the latter has a significantly slower node evaluations.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 2
  • proof
  • Theorem 3
  • proof