Extendable Planning via Multiscale Diffusion
Chang Chen, Hany Hamed, Doojin Baek, Taegu Kang, Samyeul Noh, Yoshua Bengio, Sungjin Ahn
TL;DR
The paper addresses extendable long-horizon planning for diffusion-based planners, which are typically constrained by training trajectory lengths. It introduces a two-phase framework: Progressive Trajectory Extension (PTE) to synthesize much longer trajectories via multi-round compositional stitching, and Hierarchical Multiscale Diffuser (HM-Diffuser) to enable efficient planning across temporal scales, aided by Adaptive Plan Pondering (APP) and a Recursive HM-Diffuser. The authors also present the Plan Extendable Trajectory Suite (PETS) benchmark and demonstrate that HM-Diffuser-X trained on PTE-extended data achieves strong performance across Extendable Maze2D, Extendable Franka Kitchen, and Extendable Gym-MuJoCo, with ablations confirming the benefits of multiscale planning and data augmentation. This work advances scalable long-horizon decision-making in diffusion-based planning and shows promise for offline RL settings, though it notes limitations such as stitching quality, lack of visual inputs, and the need for test-time search refinements.
Abstract
Long-horizon planning is crucial in complex environments, but diffusion-based planners like Diffuser are limited by the trajectory lengths observed during training. This creates a dilemma: long trajectories are needed for effective planning, yet they degrade model performance. In this paper, we introduce this extendable long-horizon planning challenge and propose a two-phase solution. First, Progressive Trajectory Extension incrementally constructs longer trajectories through multi-round compositional stitching. Second, the Hierarchical Multiscale Diffuser enables efficient training and inference over long horizons by reasoning across temporal scales. To avoid the need for multiple separate models, we propose Adaptive Plan Pondering and the Recursive HM-Diffuser, which unify hierarchical planning within a single model. Experiments show our approach yields strong performance gains, advancing scalable and efficient decision-making over long-horizons.
