Subgoal Search For Complex Reasoning Tasks

Konrad Czechowski; Tomasz Odrzygóźdź; Marek Zbysiński; Michał Zawalski; Krzysztof Olejnik; Yuhuai Wu; Łukasz Kuciński; Piotr Miłoś

Subgoal Search For Complex Reasoning Tasks

Konrad Czechowski, Tomasz Odrzygóźdź, Marek Zbysiński, Michał Zawalski, Krzysztof Olejnik, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś

TL;DR

Subgoal Search (kSubS) combines a learnable subgoal generator with classical planners to solve complex reasoning problems by planning over subgoals rather than atomic actions. Using a transformer-based generator to predict $k$-step ahead subgoals, and two backends—Best-First Search and Monte Carlo Tree Search—kSubS builds a high-level subgoal graph that reduces search breadth while maintaining solution quality. Empirical results across INT, Sokoban, and Rubik's Cube show substantial performance gains and favorable wall-clock times, including state-of-the-art results on INT and near-perfect Rubik's Cube solving, with evidence of out-of-distribution generalization. The work suggests that leveraging high-level subgoals can mitigate value-function errors and enable scaling to harder reasoning tasks, while detailing limitations and avenues for future improvement such as unsupervised planning loops and broader environments.

Abstract

Humans excel in solving complex reasoning tasks through a mental process of moving from one idea to a related one. Inspired by this, we propose Subgoal Search (kSubS) method. Its key component is a learned subgoal generator that produces a diversity of subgoals that are both achievable and closer to the solution. Using subgoals reduces the search space and induces a high-level search graph suitable for efficient planning. In this paper, we implement kSubS using a transformer-based subgoal module coupled with the classical best-first search framework. We show that a simple approach of generating $k$-th step ahead subgoals is surprisingly efficient on three challenging domains: two popular puzzle games, Sokoban and the Rubik's Cube, and an inequality proving benchmark INT. kSubS achieves strong results including state-of-the-art on INT within a modest computational budget.

Subgoal Search For Complex Reasoning Tasks

TL;DR

-step ahead subgoals, and two backends—Best-First Search and Monte Carlo Tree Search—kSubS builds a high-level subgoal graph that reduces search breadth while maintaining solution quality. Empirical results across INT, Sokoban, and Rubik's Cube show substantial performance gains and favorable wall-clock times, including state-of-the-art results on INT and near-perfect Rubik's Cube solving, with evidence of out-of-distribution generalization. The work suggests that leveraging high-level subgoals can mitigate value-function errors and enable scaling to harder reasoning tasks, while detailing limitations and avenues for future improvement such as unsupervised planning loops and broader environments.

Abstract

-th step ahead subgoals is surprisingly efficient on three challenging domains: two popular puzzle games, Sokoban and the Rubik's Cube, and an inequality proving benchmark INT. kSubS achieves strong results including state-of-the-art on INT within a modest computational budget.

Subgoal Search For Complex Reasoning Tasks

TL;DR

Abstract

Subgoal Search For Complex Reasoning Tasks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)