Hyper-Decision Transformer for Efficient Online Policy Adaptation
Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan
TL;DR
This work introduces the Hyper-Decision Transformer (HDT), a parameter-efficient framework that enables rapid online adaptation of pre-trained Decision Transformers to unseen tasks by inserting bottleneck adapters initialized by a demonstration-conditioned hyper-network. The base DT is kept frozen during adaptation, with only adapter parameters updated, and the hyper-network uses few-shot demonstrations to initialize these adapters, achieving strong data- and parameter-efficiency in both meta-imitation learning and meta-learning from observations. Empirical results on Meta-World ML45 show that HDT adapts faster than full fine-tuning (0.5% of parameters) and attains high success rates with limited online rollouts (20–80 episodes) when expert actions are unavailable, while outperforming several baselines in the presence of expert actions as well. The work demonstrates the importance of diverse multi-task pre-training and task-conditioned initializations for scalable, efficient adaptation of large transformer policies to novel tasks, with potential extensions to high-dimensional perception and embodied AI settings.
Abstract
Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner. To achieve such a goal, we propose to augment the base DT with an adaptation module, whose parameters are initialized by a hyper-network. When encountering unseen tasks, the hyper-network takes a handful of demonstrations as inputs and initializes the adaptation module accordingly. This initialization enables HDT to efficiently adapt to novel tasks by only fine-tuning the adaptation module. We validate HDT's generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin.
