Table of Contents
Fetching ...

Adaptformer: Sequence models as adaptive iterative planners

Akash Karthikeyan, Yash Vardhan Pant

TL;DR

Adaptformer addresses the challenge of long-horizon, multi-task planning under sparse rewards by introducing a stochastic adaptive planner that learns an energy-based trajectory model $E_\theta$ and a goal-conditioned policy. It combines a goal-augmentation module, a state discriminator for sub-goal learning, and an energy-based trajectory generator to enable online planning via iterative energy minimization and Gibbs sampling. Key contributions include a intrinsic sub-goal curriculum for generalization to unseen goals, a discriminative mechanism to maintain in-distribution trajectories, and empirical results showing up to 25% improvements over state-of-the-art in multi-goal navigation tasks, as well as the ability to adapt from single-goal demonstrations. The method demonstrates strong generalization, robustness to distractors, and practical applicability to real robotic systems, with planned future work on online fine-tuning and information gathering under limited perception.

Abstract

Despite recent advances in learning-based behavioral planning for autonomous systems, decision-making in multi-task missions remains a challenging problem. For instance, a mission might require a robot to explore an unknown environment, locate the goals, and navigate to them, even if there are obstacles along the way. Such problems are difficult to solve due to: a) sparse rewards, meaning a reward signal is available only once all the tasks in a mission have been satisfied, and b) the agent having to perform tasks at run-time that are not covered in the training data, e.g., demonstrations only from an environment where all doors were unlocked. Consequently, state-of-the-art decision-making methods in such settings are limited to missions where the required tasks are well-represented in the training demonstrations and can be solved within a short planning horizon. To overcome these limitations, we propose Adaptformer, a stochastic and adaptive planner that utilizes sequence models for sample-efficient exploration and exploitation. This framework relies on learning an energy-based heuristic, which needs to be minimized over a sequence of high-level decisions. To generate successful action sequences for long-horizon missions, Adaptformer aims to achieve shorter sub-goals, which are proposed through an intrinsic sub-goal curriculum. Through these two key components, Adaptformer allows for generalization to out-of-distribution tasks and environments, i.e., missions that were not a part of the training data. Empirical results in multiple simulation environments demonstrate the effectiveness of our method. Notably, Adaptformer not only outperforms the state-of-the-art method by up to 25% in multi-goal maze reachability tasks but also successfully adapts to multi-task missions that the state-of-the-art method could not complete, leveraging demonstrations from single-goal-reaching tasks.

Adaptformer: Sequence models as adaptive iterative planners

TL;DR

Adaptformer addresses the challenge of long-horizon, multi-task planning under sparse rewards by introducing a stochastic adaptive planner that learns an energy-based trajectory model and a goal-conditioned policy. It combines a goal-augmentation module, a state discriminator for sub-goal learning, and an energy-based trajectory generator to enable online planning via iterative energy minimization and Gibbs sampling. Key contributions include a intrinsic sub-goal curriculum for generalization to unseen goals, a discriminative mechanism to maintain in-distribution trajectories, and empirical results showing up to 25% improvements over state-of-the-art in multi-goal navigation tasks, as well as the ability to adapt from single-goal demonstrations. The method demonstrates strong generalization, robustness to distractors, and practical applicability to real robotic systems, with planned future work on online fine-tuning and information gathering under limited perception.

Abstract

Despite recent advances in learning-based behavioral planning for autonomous systems, decision-making in multi-task missions remains a challenging problem. For instance, a mission might require a robot to explore an unknown environment, locate the goals, and navigate to them, even if there are obstacles along the way. Such problems are difficult to solve due to: a) sparse rewards, meaning a reward signal is available only once all the tasks in a mission have been satisfied, and b) the agent having to perform tasks at run-time that are not covered in the training data, e.g., demonstrations only from an environment where all doors were unlocked. Consequently, state-of-the-art decision-making methods in such settings are limited to missions where the required tasks are well-represented in the training demonstrations and can be solved within a short planning horizon. To overcome these limitations, we propose Adaptformer, a stochastic and adaptive planner that utilizes sequence models for sample-efficient exploration and exploitation. This framework relies on learning an energy-based heuristic, which needs to be minimized over a sequence of high-level decisions. To generate successful action sequences for long-horizon missions, Adaptformer aims to achieve shorter sub-goals, which are proposed through an intrinsic sub-goal curriculum. Through these two key components, Adaptformer allows for generalization to out-of-distribution tasks and environments, i.e., missions that were not a part of the training data. Empirical results in multiple simulation environments demonstrate the effectiveness of our method. Notably, Adaptformer not only outperforms the state-of-the-art method by up to 25% in multi-goal maze reachability tasks but also successfully adapts to multi-task missions that the state-of-the-art method could not complete, leveraging demonstrations from single-goal-reaching tasks.

Paper Structure

This paper contains 17 sections, 6 equations, 6 figures, 2 tables, 2 algorithms.

Figures (6)

  • Figure 1: Multi-Task Mission Adaptation. AdaptFormer plans a goal-conditioned trajectory addressing several key challenges: 1. recognizing and executing implicit subtasks ($1\rightarrow4$) in long-horizon missions, 2. generalizing to tasks involving multiple goals, and 3. adaptive skill learning (i.e., unblocking pathways) using an iterative stochastic policy. Goals are highlighted in yellow, while distractors are marked in red.
  • Figure 2: Method Overview. The Adaptformer, trained on offline data (A), incorporates a Goal Augmentation module that outputs a set of waypoints (B). Concurrently, the energy module is designed to assign lower energy to an optimal set of actions (C). Training involves alternating gradient updates to both the generator and the discriminator (D), promoting the policy to learn diverse representations. At the inference stage, the system employs the learned stochastic policy to query the masked trajectory sequence (E), which is then refined through iterative energy minimization (F), framing the path planning as an optimization problem.
  • Figure 3: Environments. The simulations for both types of missions, (1) and (2) in section \ref{['sec:setup']}, were conducted in the following mazes: the first five were sourced from BabyAI, while the final one was from MiniGrid.
  • Figure 4: Number of Training Demonstrations vs. Success Rates. We report the mean and standard deviation of success rates for the GoToObjMazeS4G2 task. Note that Adaptformer outperforms LEAP in mean success rates and shows lower variance.
  • Figure 5: Energy Landscape. AdaptFormer when conditioned with sub-goals, learns to implicitly assign minimum energy values to sub-goals (pick-up key, open doors) required for task completion. States closer to the white region (low-energy) are more likely to be transitioned, indicating a higher probability of moving toward these preferred states. Conversely, LEAP does not pick up the sub-tasks associated with the task.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Example 1