Table of Contents
Fetching ...

Decision SpikeFormer: Spike-Driven Transformer for Decision Making

Wei Huang, Qinying Gu, Nanyang Ye

TL;DR

DSFormer addresses the energy-efficiency challenge of offline reinforcement learning by introducing a spike-driven Transformer that operates on decision sequences. It introduces Temporal Spiking Self-Attention (TSSA) to capture global temporal dependencies and Positional Spiking Self-Attention (PSSA) to model local positional relations, powered by Progressive Threshold-dependent Batch Normalization (PTBN) to preserve spiking dynamics during training and inference. Evaluation on the D4RL benchmark shows DSFormer can outperform both spike-driven and ANN baselines while achieving about 78.4% energy savings, highlighting the practical potential for low-power embodied AI. This work advances spike-based sequence modeling for decision making, offering a path toward neuromorphic deployment of offline RL systems.

Abstract

Offline reinforcement learning (RL) enables policy training solely on pre-collected data, avoiding direct environment interaction - a crucial benefit for energy-constrained embodied AI applications. Although Artificial Neural Networks (ANN)-based methods perform well in offline RL, their high computational and energy demands motivate exploration of more efficient alternatives. Spiking Neural Networks (SNNs) show promise for such tasks, given their low power consumption. In this work, we introduce DSFormer, the first spike-driven transformer model designed to tackle offline RL via sequence modeling. Unlike existing SNN transformers focused on spatial dimensions for vision tasks, we develop Temporal Spiking Self-Attention (TSSA) and Positional Spiking Self-Attention (PSSA) in DSFormer to capture the temporal and positional dependencies essential for sequence modeling in RL. Additionally, we propose Progressive Threshold-dependent Batch Normalization (PTBN), which combines the benefits of LayerNorm and BatchNorm to preserve temporal dependencies while maintaining the spiking nature of SNNs. Comprehensive results in the D4RL benchmark show DSFormer's superiority over both SNN and ANN counterparts, achieving 78.4% energy savings, highlighting DSFormer's advantages not only in energy efficiency but also in competitive performance. Code and models are public at https://wei-nijuan.github.io/DecisionSpikeFormer.

Decision SpikeFormer: Spike-Driven Transformer for Decision Making

TL;DR

DSFormer addresses the energy-efficiency challenge of offline reinforcement learning by introducing a spike-driven Transformer that operates on decision sequences. It introduces Temporal Spiking Self-Attention (TSSA) to capture global temporal dependencies and Positional Spiking Self-Attention (PSSA) to model local positional relations, powered by Progressive Threshold-dependent Batch Normalization (PTBN) to preserve spiking dynamics during training and inference. Evaluation on the D4RL benchmark shows DSFormer can outperform both spike-driven and ANN baselines while achieving about 78.4% energy savings, highlighting the practical potential for low-power embodied AI. This work advances spike-based sequence modeling for decision making, offering a path toward neuromorphic deployment of offline RL systems.

Abstract

Offline reinforcement learning (RL) enables policy training solely on pre-collected data, avoiding direct environment interaction - a crucial benefit for energy-constrained embodied AI applications. Although Artificial Neural Networks (ANN)-based methods perform well in offline RL, their high computational and energy demands motivate exploration of more efficient alternatives. Spiking Neural Networks (SNNs) show promise for such tasks, given their low power consumption. In this work, we introduce DSFormer, the first spike-driven transformer model designed to tackle offline RL via sequence modeling. Unlike existing SNN transformers focused on spatial dimensions for vision tasks, we develop Temporal Spiking Self-Attention (TSSA) and Positional Spiking Self-Attention (PSSA) in DSFormer to capture the temporal and positional dependencies essential for sequence modeling in RL. Additionally, we propose Progressive Threshold-dependent Batch Normalization (PTBN), which combines the benefits of LayerNorm and BatchNorm to preserve temporal dependencies while maintaining the spiking nature of SNNs. Comprehensive results in the D4RL benchmark show DSFormer's superiority over both SNN and ANN counterparts, achieving 78.4% energy savings, highlighting DSFormer's advantages not only in energy efficiency but also in competitive performance. Code and models are public at https://wei-nijuan.github.io/DecisionSpikeFormer.

Paper Structure

This paper contains 27 sections, 1 theorem, 10 equations, 3 figures, 5 tables.

Key Result

Theorem 1

Let $X^t$ represent the input at time step $t$, then the joint entropy after concatenation satisfies: $H\left(X^1, X^2, \ldots, X^T\right)=H\left(X^1\right)+H\left(X^2 \mid X^1\right)+\ldots H\left(X^T \mid X^1, X^2, \ldots, X^{T-1}\right)$. Since $X^t$ and $X^{t+1}$ are not independent according to

Figures (3)

  • Figure 1: The overall architecture of DSFormer. The input sequence $I_l$ is embedded, repeated $T$ times, fed into the Decoder Blocks through a spike-driven self-attention and MLP layer in each block, and finally passed to the Prediction Head to generate next action predictions.
  • Figure 2: Self-attention mechanisms with different computational complexity. (a) VSA inherited from vaswani2017attention. (b) SSSA, a spike version of VSA. (c) TSSA that concatenates inputs across the temporal dimension before self-attention. (d) PSSA that incorporates positional bias. To simplify the plotting, we set T = 3.
  • Figure 3: Design of tdLN. Each cube represents the feature map at timestep $t$ and calculates statistics along the $D$ dimension to obtain mean and variance with a shape of $B \times N \times T$ for normalization.

Theorems & Definitions (1)

  • Theorem 1