AdaCred: Adaptive Causal Decision Transformers with Feature Crediting
Hemant Kumawat, Saibal Mukhopadhyay
TL;DR
AdaCred tackles offline reinforcement learning and imitation learning under long-sequence and suboptimal-data challenges by modeling trajectories as causal latent graphs and applying a feature crediting and pruning mechanism. It introduces Adaptive Causal Decision Transformers, combining a Spatial Transformer with a Temporal Causal Transformer to identify a minimal sufficient latent set $\mathbf{g}_t^{\min}$ for policy learning, with theoretical guarantees on identifiability under the Markov and faithfulness assumptions. The approach is supported by a two-stage training procedure and a sparsity regularizer, enabling efficient learning from short sequences while preserving performance. Empirically, AdaCred delivers superior or competitive results on Atari and Gym benchmarks in both offline RL and imitation learning, achieving peak performance with substantially shorter trajectories and improved computational efficiency.
Abstract
Reinforcement learning (RL) can be formulated as a sequence modeling problem, where models predict future actions based on historical state-action-reward sequences. Current approaches typically require long trajectory sequences to model the environment in offline RL settings. However, these models tend to over-rely on memorizing long-term representations, which impairs their ability to effectively attribute importance to trajectories and learned representations based on task-specific relevance. In this work, we introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences. Our model adaptively learns control policy by crediting and pruning low-importance representations, retaining only those most relevant for the downstream task. Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.
