Decision Transformer as a Foundation Model for Partially Observable Continuous Control
Xiangyuan Zhang, Weichao Mao, Haoran Qiu, Tamer Başar
TL;DR
This work reframes partially observable continuous control as a sequence-prediction problem using a Decision Transformer (DT) initialized from a GPT-2 language model and trained with LoRA on offline demonstrations. DT predicts the current action from a short history of observations, actions, and rewards, with inference guided by a target return prompt to substitute rewards-to-go. Across five control tasks, including aerospace-style maneuvers and PDE control, DT demonstrates zero-shot generalization to unseen tasks and rapid few-shot adaptation, often surpassing expert-like policies with modest demonstration data. The results suggest DT as a viable foundation model for general control, offering robust generalization, data efficiency, and a unified, end-to-end controller without explicit state estimators.
Abstract
Closed-loop control of nonlinear dynamical systems with partial-state observability demands expert knowledge of a diverse, less standardized set of theoretical tools. Moreover, it requires a delicate integration of controller and estimator designs to achieve the desired system behavior. To establish a general controller synthesis framework, we explore the Decision Transformer (DT) architecture. Specifically, we first frame the control task as predicting the current optimal action based on past observations, actions, and rewards, eliminating the need for a separate estimator design. Then, we leverage the pre-trained language models, i.e., the Generative Pre-trained Transformer (GPT) series, to initialize DT and subsequently train it for control tasks using low-rank adaptation (LoRA). Our comprehensive experiments across five distinct control tasks, ranging from maneuvering aerospace systems to controlling partial differential equations (PDEs), demonstrate DT's capability to capture the parameter-agnostic structures intrinsic to control tasks. DT exhibits remarkable zero-shot generalization abilities for completely new tasks and rapidly surpasses expert performance levels with a minimal amount of demonstration data. These findings highlight the potential of DT as a foundational controller for general control applications.
