Adaptformer: Sequence models as adaptive iterative planners
Akash Karthikeyan, Yash Vardhan Pant
TL;DR
Adaptformer addresses the challenge of long-horizon, multi-task planning under sparse rewards by introducing a stochastic adaptive planner that learns an energy-based trajectory model $E_\theta$ and a goal-conditioned policy. It combines a goal-augmentation module, a state discriminator for sub-goal learning, and an energy-based trajectory generator to enable online planning via iterative energy minimization and Gibbs sampling. Key contributions include a intrinsic sub-goal curriculum for generalization to unseen goals, a discriminative mechanism to maintain in-distribution trajectories, and empirical results showing up to 25% improvements over state-of-the-art in multi-goal navigation tasks, as well as the ability to adapt from single-goal demonstrations. The method demonstrates strong generalization, robustness to distractors, and practical applicability to real robotic systems, with planned future work on online fine-tuning and information gathering under limited perception.
Abstract
Despite recent advances in learning-based behavioral planning for autonomous systems, decision-making in multi-task missions remains a challenging problem. For instance, a mission might require a robot to explore an unknown environment, locate the goals, and navigate to them, even if there are obstacles along the way. Such problems are difficult to solve due to: a) sparse rewards, meaning a reward signal is available only once all the tasks in a mission have been satisfied, and b) the agent having to perform tasks at run-time that are not covered in the training data, e.g., demonstrations only from an environment where all doors were unlocked. Consequently, state-of-the-art decision-making methods in such settings are limited to missions where the required tasks are well-represented in the training demonstrations and can be solved within a short planning horizon. To overcome these limitations, we propose Adaptformer, a stochastic and adaptive planner that utilizes sequence models for sample-efficient exploration and exploitation. This framework relies on learning an energy-based heuristic, which needs to be minimized over a sequence of high-level decisions. To generate successful action sequences for long-horizon missions, Adaptformer aims to achieve shorter sub-goals, which are proposed through an intrinsic sub-goal curriculum. Through these two key components, Adaptformer allows for generalization to out-of-distribution tasks and environments, i.e., missions that were not a part of the training data. Empirical results in multiple simulation environments demonstrate the effectiveness of our method. Notably, Adaptformer not only outperforms the state-of-the-art method by up to 25% in multi-goal maze reachability tasks but also successfully adapts to multi-task missions that the state-of-the-art method could not complete, leveraging demonstrations from single-goal-reaching tasks.
