Temporal Regularization Training: Unleashing the Potential of Spiking Neural Networks
Boxuan Zhang, Zhen Xu, Kuan Tao
TL;DR
Spiking Neural Networks (SNNs) promise event-driven, low-power computation but struggle with temporal gradient vanishing and overfitting during direct training. The authors introduce Temporal Regularization Training (TRT), a time-decaying regularizer that imposes stronger constraints on early timesteps to sustain gradient flow and guide learning. The work provides theoretical analysis linking TRT to mitigation of gradient vanishing and to temporal information concentration (TIC), and validates the method with extensive experiments on static and neuromorphic datasets, achieving state-of-the-art or competitive accuracy and flatter loss landscapes. Fisher-information trajectories under TRT reveal earlier timesteps becoming information-rich, supporting improved generalization and energy-efficient SNNs; overall, TRT offers a principled approach to leverage temporal dynamics for robust SNN performance in neuromorphic vision tasks.
Abstract
Spiking Neural Networks (SNNs) have received widespread attention due to their event-driven and low-power characteristics, making them particularly effective for processing neuromorphic data. Recent studies have shown that directly trained SNNs suffer from severe temporal gradient vanishing and overfitting issues, which fundamentally constrain their performance and generalizability. This paper unveils a temporal regularization training (TRT) memthod, designed to unleash the generalization and performance potential of SNNs through a time-decaying regularization mechanism that prioritizes early timesteps with stronger constraints. We perform theoretical analysis to reveal TRT's ability on mitigating the temporal gradient vanishment. To validate the effectiveness of TRT, we conduct experiments on both static image datasets and dynamic neuromorphic datasets, perform analysis of their results, demonstrating that TRT can effectively mitigate overfitting and help SNNs converge into flatter local minima with better generalizability. Furthermore, we establish a theoretical interpretation of TRT's temporal regularization mechanism by analyzing the temporal information dynamics inside SNNs. We track the Fisher information of SNNs during training process, showing that Fisher information progressively concentrates in early timesteps. The time-decaying regularization mechanism implemented in TRT effectively guides the network to learn robust features in early timesteps with rich information, thereby leading to significant improvements in model generalization.
