Table of Contents
Fetching ...

Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making

Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung

TL;DR

The paper addresses offline reinforcement learning by rethinking the token-mixing mechanism in MetaFormer-based predictors. It shows that the attention in Decision Transformer is misaligned with local Markovian dependencies in RL trajectories and introduces Decision ConvFormer, which replaces attention with a simple, dataset-informed convolution mixer operating on local sequences of RTG, state, and action. DC achieves state-of-the-art results across MuJoCo, AntMaze, and Atari with substantially fewer parameters and lower compute, and online finetuning (ODC) maintains strong performance. The work demonstrates improved generalization to unseen target returns and highlights the practicality of a local, convolution-based approach for decision making in RL, while outlining potential hybrids for long-horizon tasks.

Abstract

The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved state-of-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability.

Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making

TL;DR

The paper addresses offline reinforcement learning by rethinking the token-mixing mechanism in MetaFormer-based predictors. It shows that the attention in Decision Transformer is misaligned with local Markovian dependencies in RL trajectories and introduces Decision ConvFormer, which replaces attention with a simple, dataset-informed convolution mixer operating on local sequences of RTG, state, and action. DC achieves state-of-the-art results across MuJoCo, AntMaze, and Atari with substantially fewer parameters and lower compute, and online finetuning (ODC) maintains strong performance. The work demonstrates improved generalization to unseen target returns and highlights the practicality of a local, convolution-based approach for decision making in RL, while outlining potential hybrids for long-horizon tasks.

Abstract

The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved state-of-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability.
Paper Structure (33 sections, 6 equations, 6 figures, 19 tables)

This paper contains 33 sections, 6 equations, 6 figures, 19 tables.

Figures (6)

  • Figure 1: The network architecture of MetaFormer, DT, and DC.
  • Figure 2: The local dependence graph of offline RL dataset: Blue arrows represent Markov property, red arrows indicate the causal interrelation per a single timestep, and the gray dotted line shows the correlation of the adjacent returns.
  • Figure 3: Motivating results in hopper-medium: (a) attention scores of DT (1st layer), (b) attention scores of direct learning (1st layer), and (c) performance comparison.
  • Figure 4: The overall convolution operation of DC.
  • Figure 5: Inference performance with zeroed out modals in hopper-medium.
  • ...and 1 more figures