Decision SpikeFormer: Spike-Driven Transformer for Decision Making

Wei Huang; Qinying Gu; Nanyang Ye

Decision SpikeFormer: Spike-Driven Transformer for Decision Making

Wei Huang, Qinying Gu, Nanyang Ye

TL;DR

DSFormer addresses the energy-efficiency challenge of offline reinforcement learning by introducing a spike-driven Transformer that operates on decision sequences. It introduces Temporal Spiking Self-Attention (TSSA) to capture global temporal dependencies and Positional Spiking Self-Attention (PSSA) to model local positional relations, powered by Progressive Threshold-dependent Batch Normalization (PTBN) to preserve spiking dynamics during training and inference. Evaluation on the D4RL benchmark shows DSFormer can outperform both spike-driven and ANN baselines while achieving about 78.4% energy savings, highlighting the practical potential for low-power embodied AI. This work advances spike-based sequence modeling for decision making, offering a path toward neuromorphic deployment of offline RL systems.

Abstract

Offline reinforcement learning (RL) enables policy training solely on pre-collected data, avoiding direct environment interaction - a crucial benefit for energy-constrained embodied AI applications. Although Artificial Neural Networks (ANN)-based methods perform well in offline RL, their high computational and energy demands motivate exploration of more efficient alternatives. Spiking Neural Networks (SNNs) show promise for such tasks, given their low power consumption. In this work, we introduce DSFormer, the first spike-driven transformer model designed to tackle offline RL via sequence modeling. Unlike existing SNN transformers focused on spatial dimensions for vision tasks, we develop Temporal Spiking Self-Attention (TSSA) and Positional Spiking Self-Attention (PSSA) in DSFormer to capture the temporal and positional dependencies essential for sequence modeling in RL. Additionally, we propose Progressive Threshold-dependent Batch Normalization (PTBN), which combines the benefits of LayerNorm and BatchNorm to preserve temporal dependencies while maintaining the spiking nature of SNNs. Comprehensive results in the D4RL benchmark show DSFormer's superiority over both SNN and ANN counterparts, achieving 78.4% energy savings, highlighting DSFormer's advantages not only in energy efficiency but also in competitive performance. Code and models are public at https://wei-nijuan.github.io/DecisionSpikeFormer.

Decision SpikeFormer: Spike-Driven Transformer for Decision Making

TL;DR

Abstract

Decision SpikeFormer: Spike-Driven Transformer for Decision Making

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (1)