What Matters in Hierarchical Search for Combinatorial Reasoning Problems?
Michał Zawalski, Gracjan Góral, Michał Tyrolski, Emilia Wiśnios, Franciszek Budrowski, Marek Cygan, Łukasz Kuciński, Piotr Miłoś
TL;DR
The paper analyzes when hierarchical subgoal search offers tangible advantages over traditional low-level planners for challenging combinatorial reasoning tasks. By training components with imitation learning on large, diverse datasets and evaluating on multiple NP-hard environments, it shows that subgoal methods excel when value estimates are noisy, action spaces are complex, and dead ends are prevalent, while their edge diminishes with homogeneous data or excessively long subgoals. The study introduces a consistent evaluation framework and demonstrates both empirical and theoretical results (including the search-advancement and action-densification analyses) that explain the observed performance gaps. The findings provide practical guidelines for when to deploy hierarchical search, emphasize fair-baseline reporting, and offer a foundation for future theoretical and empirical exploration. The work has implications for robotics and long-horizon planning, where data diversity and distribution shifts are common challenges.
Abstract
Efficiently tackling combinatorial reasoning problems, particularly the notorious NP-hard tasks, remains a significant challenge for AI research. Recent efforts have sought to enhance planning by incorporating hierarchical high-level search strategies, known as subgoal methods. While promising, their performance against traditional low-level planners is inconsistent, raising questions about their application contexts. In this study, we conduct an in-depth exploration of subgoal-planning methods for combinatorial reasoning. We identify the attributes pivotal for leveraging the advantages of high-level search: hard-to-learn value functions, complex action spaces, presence of dead ends in the environment, or using data collected from diverse experts. We propose a consistent evaluation methodology to achieve meaningful comparisons between methods and reevaluate the state-of-the-art algorithms.
