Table of Contents
Fetching ...

Temporal Regularization Training: Unleashing the Potential of Spiking Neural Networks

Boxuan Zhang, Zhen Xu, Kuan Tao

TL;DR

Spiking Neural Networks (SNNs) promise event-driven, low-power computation but struggle with temporal gradient vanishing and overfitting during direct training. The authors introduce Temporal Regularization Training (TRT), a time-decaying regularizer that imposes stronger constraints on early timesteps to sustain gradient flow and guide learning. The work provides theoretical analysis linking TRT to mitigation of gradient vanishing and to temporal information concentration (TIC), and validates the method with extensive experiments on static and neuromorphic datasets, achieving state-of-the-art or competitive accuracy and flatter loss landscapes. Fisher-information trajectories under TRT reveal earlier timesteps becoming information-rich, supporting improved generalization and energy-efficient SNNs; overall, TRT offers a principled approach to leverage temporal dynamics for robust SNN performance in neuromorphic vision tasks.

Abstract

Spiking Neural Networks (SNNs) have received widespread attention due to their event-driven and low-power characteristics, making them particularly effective for processing neuromorphic data. Recent studies have shown that directly trained SNNs suffer from severe temporal gradient vanishing and overfitting issues, which fundamentally constrain their performance and generalizability. This paper unveils a temporal regularization training (TRT) memthod, designed to unleash the generalization and performance potential of SNNs through a time-decaying regularization mechanism that prioritizes early timesteps with stronger constraints. We perform theoretical analysis to reveal TRT's ability on mitigating the temporal gradient vanishment. To validate the effectiveness of TRT, we conduct experiments on both static image datasets and dynamic neuromorphic datasets, perform analysis of their results, demonstrating that TRT can effectively mitigate overfitting and help SNNs converge into flatter local minima with better generalizability. Furthermore, we establish a theoretical interpretation of TRT's temporal regularization mechanism by analyzing the temporal information dynamics inside SNNs. We track the Fisher information of SNNs during training process, showing that Fisher information progressively concentrates in early timesteps. The time-decaying regularization mechanism implemented in TRT effectively guides the network to learn robust features in early timesteps with rich information, thereby leading to significant improvements in model generalization.

Temporal Regularization Training: Unleashing the Potential of Spiking Neural Networks

TL;DR

Spiking Neural Networks (SNNs) promise event-driven, low-power computation but struggle with temporal gradient vanishing and overfitting during direct training. The authors introduce Temporal Regularization Training (TRT), a time-decaying regularizer that imposes stronger constraints on early timesteps to sustain gradient flow and guide learning. The work provides theoretical analysis linking TRT to mitigation of gradient vanishing and to temporal information concentration (TIC), and validates the method with extensive experiments on static and neuromorphic datasets, achieving state-of-the-art or competitive accuracy and flatter loss landscapes. Fisher-information trajectories under TRT reveal earlier timesteps becoming information-rich, supporting improved generalization and energy-efficient SNNs; overall, TRT offers a principled approach to leverage temporal dynamics for robust SNN performance in neuromorphic vision tasks.

Abstract

Spiking Neural Networks (SNNs) have received widespread attention due to their event-driven and low-power characteristics, making them particularly effective for processing neuromorphic data. Recent studies have shown that directly trained SNNs suffer from severe temporal gradient vanishing and overfitting issues, which fundamentally constrain their performance and generalizability. This paper unveils a temporal regularization training (TRT) memthod, designed to unleash the generalization and performance potential of SNNs through a time-decaying regularization mechanism that prioritizes early timesteps with stronger constraints. We perform theoretical analysis to reveal TRT's ability on mitigating the temporal gradient vanishment. To validate the effectiveness of TRT, we conduct experiments on both static image datasets and dynamic neuromorphic datasets, perform analysis of their results, demonstrating that TRT can effectively mitigate overfitting and help SNNs converge into flatter local minima with better generalizability. Furthermore, we establish a theoretical interpretation of TRT's temporal regularization mechanism by analyzing the temporal information dynamics inside SNNs. We track the Fisher information of SNNs during training process, showing that Fisher information progressively concentrates in early timesteps. The time-decaying regularization mechanism implemented in TRT effectively guides the network to learn robust features in early timesteps with rich information, thereby leading to significant improvements in model generalization.

Paper Structure

This paper contains 43 sections, 25 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: The accuracy results of different regularization methods on two datasets. For CIFAR100, we set simulation length $T$ to 4, then apply L2 regularization and weight decay both with the value of $2e-5$. For DVS-CIFAR10, we set $T$ to 10, then apply L2 regularization and weight decay both with the value of $4e-5$.
  • Figure 1: Learning curves and accuracy curves on DVS-CIFAR10 dataset. Dash lines in (e) and (f) denote the 200th epoch.
  • Figure 2: The accuracy results under different values of $\eta$
  • Figure 3:
  • Figure 4: Learning curves on DVS-CIFAR10 and N-Caltech101. (a-c) depict the learning curves of TRT, TET, and SDT methods on DVS-CIFAR10, respectively. (d-f) depict the learning curves of TRT, TET, and SDT methods on N-Caltech101, respectively. Dash line denotes the 30th epoch. Dash lines denote the 200th, 30th, 45th and 20th epoch, respectively.
  • ...and 2 more figures