Table of Contents
Fetching ...

Subgoal Search For Complex Reasoning Tasks

Konrad Czechowski, Tomasz Odrzygóźdź, Marek Zbysiński, Michał Zawalski, Krzysztof Olejnik, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś

TL;DR

Subgoal Search (kSubS) combines a learnable subgoal generator with classical planners to solve complex reasoning problems by planning over subgoals rather than atomic actions. Using a transformer-based generator to predict $k$-step ahead subgoals, and two backends—Best-First Search and Monte Carlo Tree Search—kSubS builds a high-level subgoal graph that reduces search breadth while maintaining solution quality. Empirical results across INT, Sokoban, and Rubik's Cube show substantial performance gains and favorable wall-clock times, including state-of-the-art results on INT and near-perfect Rubik's Cube solving, with evidence of out-of-distribution generalization. The work suggests that leveraging high-level subgoals can mitigate value-function errors and enable scaling to harder reasoning tasks, while detailing limitations and avenues for future improvement such as unsupervised planning loops and broader environments.

Abstract

Humans excel in solving complex reasoning tasks through a mental process of moving from one idea to a related one. Inspired by this, we propose Subgoal Search (kSubS) method. Its key component is a learned subgoal generator that produces a diversity of subgoals that are both achievable and closer to the solution. Using subgoals reduces the search space and induces a high-level search graph suitable for efficient planning. In this paper, we implement kSubS using a transformer-based subgoal module coupled with the classical best-first search framework. We show that a simple approach of generating $k$-th step ahead subgoals is surprisingly efficient on three challenging domains: two popular puzzle games, Sokoban and the Rubik's Cube, and an inequality proving benchmark INT. kSubS achieves strong results including state-of-the-art on INT within a modest computational budget.

Subgoal Search For Complex Reasoning Tasks

TL;DR

Subgoal Search (kSubS) combines a learnable subgoal generator with classical planners to solve complex reasoning problems by planning over subgoals rather than atomic actions. Using a transformer-based generator to predict -step ahead subgoals, and two backends—Best-First Search and Monte Carlo Tree Search—kSubS builds a high-level subgoal graph that reduces search breadth while maintaining solution quality. Empirical results across INT, Sokoban, and Rubik's Cube show substantial performance gains and favorable wall-clock times, including state-of-the-art results on INT and near-perfect Rubik's Cube solving, with evidence of out-of-distribution generalization. The work suggests that leveraging high-level subgoals can mitigate value-function errors and enable scaling to harder reasoning tasks, while detailing limitations and avenues for future improvement such as unsupervised planning loops and broader environments.

Abstract

Humans excel in solving complex reasoning tasks through a mental process of moving from one idea to a related one. Inspired by this, we propose Subgoal Search (kSubS) method. Its key component is a learned subgoal generator that produces a diversity of subgoals that are both achievable and closer to the solution. Using subgoals reduces the search space and induces a high-level search graph suitable for efficient planning. In this paper, we implement kSubS using a transformer-based subgoal module coupled with the classical best-first search framework. We show that a simple approach of generating -th step ahead subgoals is surprisingly efficient on three challenging domains: two popular puzzle games, Sokoban and the Rubik's Cube, and an inequality proving benchmark INT. kSubS achieves strong results including state-of-the-art on INT within a modest computational budget.

Paper Structure

This paper contains 45 sections, 8 equations, 10 figures, 6 tables, 10 algorithms.

Figures (10)

  • Figure 1: The performance of Subgoal Search. (top, left) comparison on INT (with the proof length 15) to AlphaZero. (top, right) BF-kSubS consistently achieves high performance even for small computational budgets. (bottom, left) similarly on Sokoban (board size 12x12 with 4 boxes) the advantage of BF-kSubS is clearly visible for small budget. (bottom, right) BestFS fails to solve Rubik's Cube, while BF-kSubS can achieve near-perfect performance.
  • Figure 2: BF-kSubS success rates for different values of $k$. Black curves represent the values of $k$ used in the main experiments (that is $k=4$ for Rubik's Cube and Sokoban and $k=3$ for INT).
  • Figure 3: Histogram of $\Delta$. Note that $17$% of subgoals increases the distance. Additional, $5$% leads to unsolvable "dead states" present in Sokoban.
  • Figure 4: Out-of-distribution generalization to longer proofs. We compare with the behavioral cloning agent (Policy) studied in wu2020int.
  • Figure 5: A detailed view of subgoal generation for Sokoban. Arrow represent probabilities of a given modification. Final subgoals are located in the leaves.
  • ...and 5 more figures