Table of Contents
Fetching ...

Exponential Speedups by Rerooting Levin Tree Search

Laurent Orseau, Marcus Hutter, Levi H. S. Lelis

TL;DR

The paper tackles deterministic tree search guided by a policy by introducing sqrt-LTS, which implicitly runs LTS from every visited node using a rerooter to allocate search effort across subtasks. It builds a rigorous cost framework using self-counting costs, notably the slenderness-based $\tfrac{\lambda}{\pi}$ and its rooted variant, and shows how to compose costs to bound the search cost under subtask decompositions. The core contribution is a practical algorithm, root-LTS, plus detailed theoretical guarantees that the number of node visits $T$ is bounded by a weighted maximum over subtasks, achieving speedups in favorable clue/structure scenarios and robustness to clue overload. The work offers a foundation for learning both the policy and the rerooter, with broad applicability to domains such as planning, theorem proving, and program synthesis, and suggests directions for extending to stochastic settings.

Abstract

Levin Tree Search (LTS) (Orseau et al., 2018) is a search algorithm for deterministic environments that uses a user-specified policy to guide the search. It comes with a formal guarantee on the number of search steps (node visits) for finding a solution node that depends on the quality of the policy. In this paper, we introduce a new algorithm, called $\sqrt{\text{LTS}}$ (pronounce root-LTS), which implicitly starts an LTS search rooted at every node of the search tree. Each LTS search is assigned a rerooting weight by a (user-defined or learnt) rerooter, and the search effort is shared between all LTS searches proportionally to their weights. The rerooting mechanism implicitly decomposes the search space into subtasks, leading to significant speedups. We prove that the number of node visits that $\sqrt{\text{LTS}}$ takes is competitive with the best decomposition into subtasks, at the price of a factor that relates to the uncertainty of the rerooter. If LTS takes time $T$, in the best case with $q$ rerooting points, $\sqrt{\text{LTS}}$ only takes time $O(q\sqrt[q]{T})$. Like the policy, the rerooter can be learnt from data, and we expect $\sqrt{\text{LTS}}$ to be applicable to a wide range of domains.

Exponential Speedups by Rerooting Levin Tree Search

TL;DR

The paper tackles deterministic tree search guided by a policy by introducing sqrt-LTS, which implicitly runs LTS from every visited node using a rerooter to allocate search effort across subtasks. It builds a rigorous cost framework using self-counting costs, notably the slenderness-based and its rooted variant, and shows how to compose costs to bound the search cost under subtask decompositions. The core contribution is a practical algorithm, root-LTS, plus detailed theoretical guarantees that the number of node visits is bounded by a weighted maximum over subtasks, achieving speedups in favorable clue/structure scenarios and robustness to clue overload. The work offers a foundation for learning both the policy and the rerooter, with broad applicability to domains such as planning, theorem proving, and program synthesis, and suggests directions for extending to stochastic settings.

Abstract

Levin Tree Search (LTS) (Orseau et al., 2018) is a search algorithm for deterministic environments that uses a user-specified policy to guide the search. It comes with a formal guarantee on the number of search steps (node visits) for finding a solution node that depends on the quality of the policy. In this paper, we introduce a new algorithm, called (pronounce root-LTS), which implicitly starts an LTS search rooted at every node of the search tree. Each LTS search is assigned a rerooting weight by a (user-defined or learnt) rerooter, and the search effort is shared between all LTS searches proportionally to their weights. The rerooting mechanism implicitly decomposes the search space into subtasks, leading to significant speedups. We prove that the number of node visits that takes is competitive with the best decomposition into subtasks, at the price of a factor that relates to the uncertainty of the rerooter. If LTS takes time , in the best case with rerooting points, only takes time . Like the policy, the rerooter can be learnt from data, and we expect to be applicable to a wide range of domains.

Paper Structure

This paper contains 30 sections, 86 equations, 5 figures, 2 algorithms.

Figures (5)

  • Figure 1: A schematic representation of the binary tree of \ref{['ex:1000clues']} with four clue nodes $n_a, n_b, n_c, n_d$ at depth 50, and the solution node $n^*$ at depth 100.
  • Figure 2: The tree of \ref{['ex:loose_dop', 'ex:tight_sop']}.
  • Figure 3: See \ref{['ex:WTmax_vs_main_bound']}. The cost of reaching $n_{T_2}$ from $n_1 = n_{T_1}$ using LTS (with $\tfrac{\lambda}{\pi}$ as the cost function) is $c^{\mathrm{r}}_{T_1}(n_{T_2}) = A$. The cost of reaching $n_{T_4}$ from $n_{T_3}$ is $c^{\mathrm{r}}_{T_3}(n_{T_4}) = B$, etc.
  • Figure 4: The $D$-chain environment. Edge labels are actions, and node labels are rewards. The binary tree is perfect and infinite. UCT, 'Polynomial' UCT, AlphaZero and other MCTS variants take double exponential time (and more) with the depth of the node $n^*$ of highest reward.
  • Figure 5: A simple level of Sokoban. The player (the pumpkin) can move in all 4 directions (up, down, left, right) and must push all 4 boxes (brown squares) onto one goal spot each (blue diamond cells). Boxes cannot be pulled.

Theorems & Definitions (20)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof : Proof of \ref{['ex:slend_left_right']}
  • proof
  • proof
  • proof
  • ...and 10 more