Table of Contents
Fetching ...

Predictive Coding for Decision Transformer

Tung M. Luu, Donghoon Lee, Chang D. Yoo

TL;DR

PCDT tackles the limitations of return-conditioned DTs in offline goal-conditioned RL by conditioning actions on predictive codings that encode future information. It employs a two-stage architecture with a trajectory autoencoder to produce predictive latent codes and a causal transformer that uses these codes to predict actions, enabling learning from reward-free data and enhancing stitching in long-horizon tasks. Across AntMaze, FrankaKitchen, and a Sawyer robot, PCDT achieves competitive or superior performance to value-based and other transformer-based approaches, particularly in challenging stitching scenarios. The approach broadens offline RL applicability by leveraging large unlabeled datasets and offering robust, future-aware decision making in real-world robotics.

Abstract

Recent work in offline reinforcement learning (RL) has demonstrated the effectiveness of formulating decision-making as return-conditioned supervised learning. Notably, the decision transformer (DT) architecture has shown promise across various domains. However, despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL. This limitation stems from the inefficiency of return conditioning for guiding policy learning, particularly in unstructured and suboptimal datasets, resulting in DTs failing to effectively learn temporal compositionality. Moreover, this problem might be further exacerbated in long-horizon sparse-reward tasks. To address this challenge, we propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods. PCDT utilizes an architecture that extends the DT framework, conditioned on predictive codings, enabling decision-making based on both past and future factors, thereby improving generalization. Through extensive experiments on eight datasets from the AntMaze and FrankaKitchen environments, our proposed method achieves performance on par with or surpassing existing popular value-based and transformer-based methods in offline goal-conditioned RL. Furthermore, we also evaluate our method on a goal-reaching task with a physical robot.

Predictive Coding for Decision Transformer

TL;DR

PCDT tackles the limitations of return-conditioned DTs in offline goal-conditioned RL by conditioning actions on predictive codings that encode future information. It employs a two-stage architecture with a trajectory autoencoder to produce predictive latent codes and a causal transformer that uses these codes to predict actions, enabling learning from reward-free data and enhancing stitching in long-horizon tasks. Across AntMaze, FrankaKitchen, and a Sawyer robot, PCDT achieves competitive or superior performance to value-based and other transformer-based approaches, particularly in challenging stitching scenarios. The approach broadens offline RL applicability by leveraging large unlabeled datasets and offering robust, future-aware decision making in real-world robotics.

Abstract

Recent work in offline reinforcement learning (RL) has demonstrated the effectiveness of formulating decision-making as return-conditioned supervised learning. Notably, the decision transformer (DT) architecture has shown promise across various domains. However, despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL. This limitation stems from the inefficiency of return conditioning for guiding policy learning, particularly in unstructured and suboptimal datasets, resulting in DTs failing to effectively learn temporal compositionality. Moreover, this problem might be further exacerbated in long-horizon sparse-reward tasks. To address this challenge, we propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods. PCDT utilizes an architecture that extends the DT framework, conditioned on predictive codings, enabling decision-making based on both past and future factors, thereby improving generalization. Through extensive experiments on eight datasets from the AntMaze and FrankaKitchen environments, our proposed method achieves performance on par with or surpassing existing popular value-based and transformer-based methods in offline goal-conditioned RL. Furthermore, we also evaluate our method on a goal-reaching task with a physical robot.
Paper Structure (14 sections, 6 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 6 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: The overview of the PCDT model. PCDT takes as input a length-$k$ sub-trajectory associated with predictive latent codings to make decisions. During training, predictive codings are extracted from states within the same sub-trajectory. During inference, PCDT generates the next action in an autoregressive manner, akin to DT chen2021decision.
  • Figure 2: We input the sequence of state-dummy token pairs concatenated with the target goal into the trajectory encoder. Within this sequence, states are randomly masked. The trajectory decoder receives stacked inputs of latent codes and masked tokens to reconstruct the input states. Additionally, in addition to reconstruction, we leverage the same latent codes for predicting future states, aiming to enhance predictiveness.
  • Figure 3: We evaluate the performance of PCDT on three sparse-reward goal-conditioned tasks. From left to right: AntMaze and FrankaKitchen, taken from the D4RL benchmark fu2020d4rl, and a physical Rethink Sawyer robot performing goal-reaching task.
  • Figure 4: A plot illustrating the average performance of each algorithm listed in Table \ref{['tab:comparison']}. The bars are color-coded based on a simplified algorithm categorization.
  • Figure 5: The effect of different lengths of future states ($L$) during the learning of predictive coding on agent performance.
  • ...and 3 more figures