Revisiting Direct Encoding: Learnable Temporal Dynamics for Static Image Spiking Neural Networks
Huaxu He
TL;DR
Static-image SNNs lack inherent temporal dynamics, making direct encoding prone to a temporal collapse. The authors show that the gap to rate-based encodings arises largely from convolutional learnability and surrogate gradient choices, not encoding principles, and they introduce a minimal learnable temporal encoding with adaptive phase shifts to inject temporal variation. Validation on CIFAR-10/100 and VOC demonstrates improved performance under ultra-low time steps and reveals that temporal encoding can boost tasks requiring temporal processing, such as detection, while maintaining strong classification performance. Overall, the work unifies encodings under a framework that leverages learnable convolutional front-ends and a learnable temporal mechanism to restore meaningful temporal dynamics for static inputs.
Abstract
Handling static images that lack inherent temporal dynamics remains a fundamental challenge for spiking neural networks (SNNs). In directly trained SNNs, static inputs are typically repeated across time steps, causing the temporal dimension to collapse into a rate like representation and preventing meaningful temporal modeling. This work revisits the reported performance gap between direct and rate based encodings and shows that it primarily stems from convolutional learnability and surrogate gradient formulations rather than the encoding schemes themselves. To illustrate this mechanism level clarification, we introduce a minimal learnable temporal encoding that adds adaptive phase shifts to induce meaningful temporal variation from static inputs.
