Predictive Coding for Decision Transformer
Tung M. Luu, Donghoon Lee, Chang D. Yoo
TL;DR
PCDT tackles the limitations of return-conditioned DTs in offline goal-conditioned RL by conditioning actions on predictive codings that encode future information. It employs a two-stage architecture with a trajectory autoencoder to produce predictive latent codes and a causal transformer that uses these codes to predict actions, enabling learning from reward-free data and enhancing stitching in long-horizon tasks. Across AntMaze, FrankaKitchen, and a Sawyer robot, PCDT achieves competitive or superior performance to value-based and other transformer-based approaches, particularly in challenging stitching scenarios. The approach broadens offline RL applicability by leveraging large unlabeled datasets and offering robust, future-aware decision making in real-world robotics.
Abstract
Recent work in offline reinforcement learning (RL) has demonstrated the effectiveness of formulating decision-making as return-conditioned supervised learning. Notably, the decision transformer (DT) architecture has shown promise across various domains. However, despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL. This limitation stems from the inefficiency of return conditioning for guiding policy learning, particularly in unstructured and suboptimal datasets, resulting in DTs failing to effectively learn temporal compositionality. Moreover, this problem might be further exacerbated in long-horizon sparse-reward tasks. To address this challenge, we propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods. PCDT utilizes an architecture that extends the DT framework, conditioned on predictive codings, enabling decision-making based on both past and future factors, thereby improving generalization. Through extensive experiments on eight datasets from the AntMaze and FrankaKitchen environments, our proposed method achieves performance on par with or surpassing existing popular value-based and transformer-based methods in offline goal-conditioned RL. Furthermore, we also evaluate our method on a goal-reaching task with a physical robot.
