Table of Contents
Fetching ...

Decision Transformer as a Foundation Model for Partially Observable Continuous Control

Xiangyuan Zhang, Weichao Mao, Haoran Qiu, Tamer Başar

TL;DR

This work reframes partially observable continuous control as a sequence-prediction problem using a Decision Transformer (DT) initialized from a GPT-2 language model and trained with LoRA on offline demonstrations. DT predicts the current action from a short history of observations, actions, and rewards, with inference guided by a target return prompt to substitute rewards-to-go. Across five control tasks, including aerospace-style maneuvers and PDE control, DT demonstrates zero-shot generalization to unseen tasks and rapid few-shot adaptation, often surpassing expert-like policies with modest demonstration data. The results suggest DT as a viable foundation model for general control, offering robust generalization, data efficiency, and a unified, end-to-end controller without explicit state estimators.

Abstract

Closed-loop control of nonlinear dynamical systems with partial-state observability demands expert knowledge of a diverse, less standardized set of theoretical tools. Moreover, it requires a delicate integration of controller and estimator designs to achieve the desired system behavior. To establish a general controller synthesis framework, we explore the Decision Transformer (DT) architecture. Specifically, we first frame the control task as predicting the current optimal action based on past observations, actions, and rewards, eliminating the need for a separate estimator design. Then, we leverage the pre-trained language models, i.e., the Generative Pre-trained Transformer (GPT) series, to initialize DT and subsequently train it for control tasks using low-rank adaptation (LoRA). Our comprehensive experiments across five distinct control tasks, ranging from maneuvering aerospace systems to controlling partial differential equations (PDEs), demonstrate DT's capability to capture the parameter-agnostic structures intrinsic to control tasks. DT exhibits remarkable zero-shot generalization abilities for completely new tasks and rapidly surpasses expert performance levels with a minimal amount of demonstration data. These findings highlight the potential of DT as a foundational controller for general control applications.

Decision Transformer as a Foundation Model for Partially Observable Continuous Control

TL;DR

This work reframes partially observable continuous control as a sequence-prediction problem using a Decision Transformer (DT) initialized from a GPT-2 language model and trained with LoRA on offline demonstrations. DT predicts the current action from a short history of observations, actions, and rewards, with inference guided by a target return prompt to substitute rewards-to-go. Across five control tasks, including aerospace-style maneuvers and PDE control, DT demonstrates zero-shot generalization to unseen tasks and rapid few-shot adaptation, often surpassing expert-like policies with modest demonstration data. The results suggest DT as a viable foundation model for general control, offering robust generalization, data efficiency, and a unified, end-to-end controller without explicit state estimators.

Abstract

Closed-loop control of nonlinear dynamical systems with partial-state observability demands expert knowledge of a diverse, less standardized set of theoretical tools. Moreover, it requires a delicate integration of controller and estimator designs to achieve the desired system behavior. To establish a general controller synthesis framework, we explore the Decision Transformer (DT) architecture. Specifically, we first frame the control task as predicting the current optimal action based on past observations, actions, and rewards, eliminating the need for a separate estimator design. Then, we leverage the pre-trained language models, i.e., the Generative Pre-trained Transformer (GPT) series, to initialize DT and subsequently train it for control tasks using low-rank adaptation (LoRA). Our comprehensive experiments across five distinct control tasks, ranging from maneuvering aerospace systems to controlling partial differential equations (PDEs), demonstrate DT's capability to capture the parameter-agnostic structures intrinsic to control tasks. DT exhibits remarkable zero-shot generalization abilities for completely new tasks and rapidly surpasses expert performance levels with a minimal amount of demonstration data. These findings highlight the potential of DT as a foundational controller for general control applications.
Paper Structure (15 sections, 4 equations, 3 figures, 4 tables)

This paper contains 15 sections, 4 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Illustration of the DT Architecture. Left: An offline control dataset sampled from some unknown behavior policies. Middle: DT predicts an optimal action $a_t$ autoregressively based on an input sequence of 'reward-to-go's, observations and actions. The pre-trained language weights of DT are kept frozen while we employ LoRA for control training. Right: DT quickly generalizes to new control tasks after seeing minimal offline demonstrations.
  • Figure 1: Left: PPO policy in Burgers with $\nu=10^{-2}$ and $\phi=0.125$. Right: We sample the Burgers parameters uniformly from the shaded region for generating the dataset and in-distribution tests. The out-of-distribution tests are manually selected.
  • Figure 2: PPO policy in CDR with $\nu=10^{-2}$, $c=0.1$, $\zeta=0$, and $\phi=0.1$.