Table of Contents
Fetching ...

Revisiting Direct Encoding: Learnable Temporal Dynamics for Static Image Spiking Neural Networks

Huaxu He

TL;DR

Static-image SNNs lack inherent temporal dynamics, making direct encoding prone to a temporal collapse. The authors show that the gap to rate-based encodings arises largely from convolutional learnability and surrogate gradient choices, not encoding principles, and they introduce a minimal learnable temporal encoding with adaptive phase shifts to inject temporal variation. Validation on CIFAR-10/100 and VOC demonstrates improved performance under ultra-low time steps and reveals that temporal encoding can boost tasks requiring temporal processing, such as detection, while maintaining strong classification performance. Overall, the work unifies encodings under a framework that leverages learnable convolutional front-ends and a learnable temporal mechanism to restore meaningful temporal dynamics for static inputs.

Abstract

Handling static images that lack inherent temporal dynamics remains a fundamental challenge for spiking neural networks (SNNs). In directly trained SNNs, static inputs are typically repeated across time steps, causing the temporal dimension to collapse into a rate like representation and preventing meaningful temporal modeling. This work revisits the reported performance gap between direct and rate based encodings and shows that it primarily stems from convolutional learnability and surrogate gradient formulations rather than the encoding schemes themselves. To illustrate this mechanism level clarification, we introduce a minimal learnable temporal encoding that adds adaptive phase shifts to induce meaningful temporal variation from static inputs.

Revisiting Direct Encoding: Learnable Temporal Dynamics for Static Image Spiking Neural Networks

TL;DR

Static-image SNNs lack inherent temporal dynamics, making direct encoding prone to a temporal collapse. The authors show that the gap to rate-based encodings arises largely from convolutional learnability and surrogate gradient choices, not encoding principles, and they introduce a minimal learnable temporal encoding with adaptive phase shifts to inject temporal variation. Validation on CIFAR-10/100 and VOC demonstrates improved performance under ultra-low time steps and reveals that temporal encoding can boost tasks requiring temporal processing, such as detection, while maintaining strong classification performance. Overall, the work unifies encodings under a framework that leverages learnable convolutional front-ends and a learnable temporal mechanism to restore meaningful temporal dynamics for static inputs.

Abstract

Handling static images that lack inherent temporal dynamics remains a fundamental challenge for spiking neural networks (SNNs). In directly trained SNNs, static inputs are typically repeated across time steps, causing the temporal dimension to collapse into a rate like representation and preventing meaningful temporal modeling. This work revisits the reported performance gap between direct and rate based encodings and shows that it primarily stems from convolutional learnability and surrogate gradient formulations rather than the encoding schemes themselves. To illustrate this mechanism level clarification, we introduce a minimal learnable temporal encoding that adds adaptive phase shifts to induce meaningful temporal variation from static inputs.

Paper Structure

This paper contains 17 sections, 8 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Comparison of traditional encoders, direct encoding, and the proposed learnable temporal encoding. Traditional encoders produce time-varying inputs via fixed rules, while direct encoding replicates static inputs, causing temporal collapse. The proposed method restores temporal variation through learnable phase shifts. Here, $x$ represents the input feature maps at each time step.
  • Figure 2: Example of the learnable temporal encoding (phase encoding). Values $\theta_0 = 1$, $\theta_1 = 0.66$, and $\theta_2 = 0.33$ are illustrative; actual $\theta_t$ are learnable.