Table of Contents
Fetching ...

GenPlan: Generative Sequence Models as Adaptive Planners

Akash Karthikeyan, Yash Vardhan Pant

TL;DR

GenPlan presents a discrete-flow, energy-guided denoising framework for long-horizon planning that learns to jointly model goals, states, and actions from offline demonstrations. By viewing planning as iterative denoising with a rate matrix and an energy objective, GenPlan enables dynamic goal generation and exploration, improving adaptation to unseen constraints and multi-task missions. Empirical results on BabyAI and continuous manipulation tasks show robust generalization and superior performance on adaptive planning, surpassing state-of-the-art baselines by significant margins. The approach offers a scalable, task-agnostic pathway to multi-goal planning in complex environments and highlights the value of energy-based, multi-modal trajectory generation for robust autonomous decision-making.

Abstract

Sequence models have demonstrated remarkable success in behavioral planning by leveraging previously collected demonstrations. However, solving multi-task missions remains a significant challenge, particularly when the planner must adapt to unseen constraints and tasks, such as discovering goals and unlocking doors. Such behavioral planning problems are challenging to solve due to: a) agents failing to adapt beyond the single task learned through their reward function, and b) inability to generalize to new environments, e.g., those with walls and locked doors, when trained only in planar environments. Consequently, state-of-the-art decision-making methods are limited to missions where the required tasks are well-represented in the training demonstrations and can be solved within a short (temporal) planning horizon. To address this, we propose GenPlan: a stochastic and adaptive planner that leverages discrete-flow models for generative sequence modeling, enabling sample-efficient exploration and exploitation. This framework relies on an iterative denoising procedure to generate a sequence of goals and actions. This approach captures multi-modal action distributions and facilitates goal and task discovery, thereby generalizing to out-of-distribution tasks and environments, i.e., missions not part of the training data. We demonstrate the effectiveness of our method through multiple simulation environments. Notably, GenPlan outperforms state-of-the-art methods by over 10% on adaptive planning tasks, where the agent adapts to multi-task missions while leveraging demonstrations from single-goal-reaching tasks. Our code is available at https://github.com/CL2-UWaterloo/GenPlan.

GenPlan: Generative Sequence Models as Adaptive Planners

TL;DR

GenPlan presents a discrete-flow, energy-guided denoising framework for long-horizon planning that learns to jointly model goals, states, and actions from offline demonstrations. By viewing planning as iterative denoising with a rate matrix and an energy objective, GenPlan enables dynamic goal generation and exploration, improving adaptation to unseen constraints and multi-task missions. Empirical results on BabyAI and continuous manipulation tasks show robust generalization and superior performance on adaptive planning, surpassing state-of-the-art baselines by significant margins. The approach offers a scalable, task-agnostic pathway to multi-goal planning in complex environments and highlights the value of energy-based, multi-modal trajectory generation for robust autonomous decision-making.

Abstract

Sequence models have demonstrated remarkable success in behavioral planning by leveraging previously collected demonstrations. However, solving multi-task missions remains a significant challenge, particularly when the planner must adapt to unseen constraints and tasks, such as discovering goals and unlocking doors. Such behavioral planning problems are challenging to solve due to: a) agents failing to adapt beyond the single task learned through their reward function, and b) inability to generalize to new environments, e.g., those with walls and locked doors, when trained only in planar environments. Consequently, state-of-the-art decision-making methods are limited to missions where the required tasks are well-represented in the training demonstrations and can be solved within a short (temporal) planning horizon. To address this, we propose GenPlan: a stochastic and adaptive planner that leverages discrete-flow models for generative sequence modeling, enabling sample-efficient exploration and exploitation. This framework relies on an iterative denoising procedure to generate a sequence of goals and actions. This approach captures multi-modal action distributions and facilitates goal and task discovery, thereby generalizing to out-of-distribution tasks and environments, i.e., missions not part of the training data. We demonstrate the effectiveness of our method through multiple simulation environments. Notably, GenPlan outperforms state-of-the-art methods by over 10% on adaptive planning tasks, where the agent adapts to multi-task missions while leveraging demonstrations from single-goal-reaching tasks. Our code is available at https://github.com/CL2-UWaterloo/GenPlan.

Paper Structure

This paper contains 56 sections, 7 equations, 8 figures, 20 tables, 2 algorithms.

Figures (8)

  • Figure 1: Overview. GenPlan is a generative, multi-step planner that optimizes energy landscape to adapt to complex tasks and iteratively refine long-horizon missions. Goals are highlighted in yellow, and distractors are marked in red.
  • Figure 2: Method Overview. GenPlan, trained on offline data (A), learns to jointly model action, goal, and state distributions. In (B), the joint denoising model (see Section \ref{['sec:joint_model']}) takes in a corrupted trajectory $\tau^t$ and predicts the clean trajectory $\tau^1$. (C) Demonstrates the joint inference of goals and actions by simulating the reverse , as detailed in Algorithm \ref{['alg:sampling']}.
  • Figure 3: Energy Landscape. GenPlan, when conditioned on sub-goals, implicitly assigns minimal energy to necessary sub-goals (e.g., picking up keys, opening doors) for task completion. States closer to the white region are more likely to be transitioned into. LEAP, in contrast, does not prioritize these sub-tasks.
  • Figure 4: Trajectory Planning (TP). The agent is randomly initialized and must navigate through a maze-like environment to reach the goal(s). In each evaluation, the map layout, the agent's initial position, and goal positions are varied. Note that in the case of GenPlan, the agent does not have access to the ground truth goal positions, whereas other baselines do.
  • Figure 7: GenPlan I/O. During planning, the model takes in a corrupted trajectory along with the current state and past observations (if available), and iteratively recovers the clean trajectory.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Example 1
  • Remark 1
  • Remark 2