Hyper-Decision Transformer for Efficient Online Policy Adaptation

Mengdi Xu; Yuchen Lu; Yikang Shen; Shun Zhang; Ding Zhao; Chuang Gan

Hyper-Decision Transformer for Efficient Online Policy Adaptation

Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan

TL;DR

This work introduces the Hyper-Decision Transformer (HDT), a parameter-efficient framework that enables rapid online adaptation of pre-trained Decision Transformers to unseen tasks by inserting bottleneck adapters initialized by a demonstration-conditioned hyper-network. The base DT is kept frozen during adaptation, with only adapter parameters updated, and the hyper-network uses few-shot demonstrations to initialize these adapters, achieving strong data- and parameter-efficiency in both meta-imitation learning and meta-learning from observations. Empirical results on Meta-World ML45 show that HDT adapts faster than full fine-tuning (0.5% of parameters) and attains high success rates with limited online rollouts (20–80 episodes) when expert actions are unavailable, while outperforming several baselines in the presence of expert actions as well. The work demonstrates the importance of diverse multi-task pre-training and task-conditioned initializations for scalable, efficient adaptation of large transformer policies to novel tasks, with potential extensions to high-dimensional perception and embodied AI settings.

Abstract

Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner. To achieve such a goal, we propose to augment the base DT with an adaptation module, whose parameters are initialized by a hyper-network. When encountering unseen tasks, the hyper-network takes a handful of demonstrations as inputs and initializes the adaptation module accordingly. This initialization enables HDT to efficiently adapt to novel tasks by only fine-tuning the adaptation module. We validate HDT's generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin.

Hyper-Decision Transformer for Efficient Online Policy Adaptation

TL;DR

Abstract

Paper Structure (36 sections, 3 equations, 9 figures, 9 tables, 4 algorithms)

This paper contains 36 sections, 3 equations, 9 figures, 9 tables, 4 algorithms.

Introduction
Related Work
Hyper Decision Transformer
Problem Formulation: Efficient Adaptation from Observations
Decision Transformer (DT) as the Pre-trained agent
Adaptation Module
Hyper-network
Algorithm
Experimental Setup
Environments and Offline Datasets
Baselines
Results and Discussions
Does HDT generalize to unseen tasks with parameter and data efficiency?
Adapter layers v.s. other efficient fine-tuning methods in policy learning
Does the hyper-network encode task-specific information?
...and 21 more sections

Figures (9)

Figure 1: Efficient online policy adaptation of pre-trained transformer models with few-shot demonstrations. To facilitate data efficiency, we introduce a demonstration-conditioned adaptation module that helps leverage prior knowledge in the demonstration and guide exploration. When adapting to novel tasks, we only fine-tune the adaptation module to maintain parameter efficiency.
Figure 2: Model architecture of Hyper-Decision Transformer (HDT). Similar to DT, HDT takes recent contexts as input and outputs fine-grind actions. To encode task-specific information, HDT injects adapter layers into each decoder block. The adapter layer's parameters come from a stand-alone hyper-network that takes both demonstrations without actions and the decoder's layer id.
Figure 3: Qualitative results in Meta-World benchmark. Each curve is averaged across 5 seeds. We show the training curves of our proposed HDT and baselines in (a), the adaptation performance with a one-shot demonstration containing expert actions (meta-IL) in (b), and adaption performance with a demonstration containing no expert actions (meta-LfO) in (c). When expert actions are unavailable, HDT outperforms baselines by a large margin.
Figure 4: Ablation results to show the effect of training tasks and model size. Decreasing the adapter's bottleneck hidden size would slow down the convergence when there are expert actions as in (a), and cause a significant performance drop when no expert actions as in (b). Similar trends are observed with decreased base DT's model size as in (c) and (d). With 10 training tasks, HDT-small-train underperforms HDT.
Figure 5: Model Architecture of HDT-IA3.
...and 4 more figures

Hyper-Decision Transformer for Efficient Online Policy Adaptation

TL;DR

Abstract

Hyper-Decision Transformer for Efficient Online Policy Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)