GenPlan: Generative Sequence Models as Adaptive Planners
Akash Karthikeyan, Yash Vardhan Pant
TL;DR
GenPlan presents a discrete-flow, energy-guided denoising framework for long-horizon planning that learns to jointly model goals, states, and actions from offline demonstrations. By viewing planning as iterative denoising with a rate matrix and an energy objective, GenPlan enables dynamic goal generation and exploration, improving adaptation to unseen constraints and multi-task missions. Empirical results on BabyAI and continuous manipulation tasks show robust generalization and superior performance on adaptive planning, surpassing state-of-the-art baselines by significant margins. The approach offers a scalable, task-agnostic pathway to multi-goal planning in complex environments and highlights the value of energy-based, multi-modal trajectory generation for robust autonomous decision-making.
Abstract
Sequence models have demonstrated remarkable success in behavioral planning by leveraging previously collected demonstrations. However, solving multi-task missions remains a significant challenge, particularly when the planner must adapt to unseen constraints and tasks, such as discovering goals and unlocking doors. Such behavioral planning problems are challenging to solve due to: a) agents failing to adapt beyond the single task learned through their reward function, and b) inability to generalize to new environments, e.g., those with walls and locked doors, when trained only in planar environments. Consequently, state-of-the-art decision-making methods are limited to missions where the required tasks are well-represented in the training demonstrations and can be solved within a short (temporal) planning horizon. To address this, we propose GenPlan: a stochastic and adaptive planner that leverages discrete-flow models for generative sequence modeling, enabling sample-efficient exploration and exploitation. This framework relies on an iterative denoising procedure to generate a sequence of goals and actions. This approach captures multi-modal action distributions and facilitates goal and task discovery, thereby generalizing to out-of-distribution tasks and environments, i.e., missions not part of the training data. We demonstrate the effectiveness of our method through multiple simulation environments. Notably, GenPlan outperforms state-of-the-art methods by over 10% on adaptive planning tasks, where the agent adapts to multi-task missions while leveraging demonstrations from single-goal-reaching tasks. Our code is available at https://github.com/CL2-UWaterloo/GenPlan.
