Table of Contents
Fetching ...

Traces Propagation: Memory-Efficient and Scalable Forward-Only Learning in Spiking Neural Networks

Lorenzo Pes, Bojian Yin, Sander Stuijk, Federico Corradi

TL;DR

Traces Propagation (TP) introduces a forward-only, memory-efficient learning rule for spiking neural networks that combines eligibility traces for temporal credit assignment with a layer-wise contrastive loss to address spatial credit assignment without auxiliary matrices. TP achieves state-of-the-art performance among fully-local methods on N-MNIST and SHD and shows competitive results on DVS-GESTURE and DVS-CIFAR10, while scaling to deep architectures like VGG-9 and enabling practical on-device fine-tuning for edge scenarios such as Google Speech Commands. It offers favorable memory scaling, with per-step memory costs of O(LH) and time complexity O(B^2LH), providing advantages when the number of output classes is large relative to batch size. Overall, TP advances memory-efficient, scalable, and edge-friendly learning in SNNs, closing the gap between fully local rules and real-world deployment needs.

Abstract

Spiking Neural Networks (SNNs) provide an efficient framework for processing dynamic spatio-temporal signals and for investigating the learning principles underlying biological neural systems. A key challenge in training SNNs is to solve both spatial and temporal credit assignment. The dominant approach for training SNNs is Backpropagation Through Time (BPTT) with surrogate gradients. However, BPTT is in stark contrast with the spatial and temporal locality observed in biological neural systems and leads to high computational and memory demands, limiting efficient training strategies and on-device learning. Although existing local learning rules achieve local temporal credit assignment by leveraging eligibility traces, they fail to address the spatial credit assignment without resorting to auxiliary layer-wise matrices, which increase memory overhead and hinder scalability, especially on embedded devices. In this work, we propose Traces Propagation (TP), a forward-only, memory-efficient, scalable, and fully local learning rule that combines eligibility traces with a layer-wise contrastive loss without requiring auxiliary layer-wise matrices. TP outperforms other fully local learning rules on NMNIST and SHD datasets. On more complex datasets such as DVS-GESTURE and DVS-CIFAR10, TP showcases competitive performance and scales effectively to deeper SNN architectures such as VGG-9, while providing favorable memory scaling compared to prior fully local scalable rules, for datasets with a significant number of classes. Finally, we show that TP is well suited for practical fine-tuning tasks, such as keyword spotting on the Google Speech Commands dataset, thus paving the way for efficient learning at the edge.

Traces Propagation: Memory-Efficient and Scalable Forward-Only Learning in Spiking Neural Networks

TL;DR

Traces Propagation (TP) introduces a forward-only, memory-efficient learning rule for spiking neural networks that combines eligibility traces for temporal credit assignment with a layer-wise contrastive loss to address spatial credit assignment without auxiliary matrices. TP achieves state-of-the-art performance among fully-local methods on N-MNIST and SHD and shows competitive results on DVS-GESTURE and DVS-CIFAR10, while scaling to deep architectures like VGG-9 and enabling practical on-device fine-tuning for edge scenarios such as Google Speech Commands. It offers favorable memory scaling, with per-step memory costs of O(LH) and time complexity O(B^2LH), providing advantages when the number of output classes is large relative to batch size. Overall, TP advances memory-efficient, scalable, and edge-friendly learning in SNNs, closing the gap between fully local rules and real-world deployment needs.

Abstract

Spiking Neural Networks (SNNs) provide an efficient framework for processing dynamic spatio-temporal signals and for investigating the learning principles underlying biological neural systems. A key challenge in training SNNs is to solve both spatial and temporal credit assignment. The dominant approach for training SNNs is Backpropagation Through Time (BPTT) with surrogate gradients. However, BPTT is in stark contrast with the spatial and temporal locality observed in biological neural systems and leads to high computational and memory demands, limiting efficient training strategies and on-device learning. Although existing local learning rules achieve local temporal credit assignment by leveraging eligibility traces, they fail to address the spatial credit assignment without resorting to auxiliary layer-wise matrices, which increase memory overhead and hinder scalability, especially on embedded devices. In this work, we propose Traces Propagation (TP), a forward-only, memory-efficient, scalable, and fully local learning rule that combines eligibility traces with a layer-wise contrastive loss without requiring auxiliary layer-wise matrices. TP outperforms other fully local learning rules on NMNIST and SHD datasets. On more complex datasets such as DVS-GESTURE and DVS-CIFAR10, TP showcases competitive performance and scales effectively to deeper SNN architectures such as VGG-9, while providing favorable memory scaling compared to prior fully local scalable rules, for datasets with a significant number of classes. Finally, we show that TP is well suited for practical fine-tuning tasks, such as keyword spotting on the Google Speech Commands dataset, thus paving the way for efficient learning at the edge.

Paper Structure

This paper contains 24 sections, 17 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Locality comparison in learning algorithms. Violet represent the spatial credit assignment, while green indicate the temporal credit assignment. Left: Non-local learning (e.g., BPTT) relies on a global error signal $E_L$ that requires both spatial and temporal components for the whole sequence. Center: Time-local learning (e.g, E-prop, OSTL, OTTP and S-TLLR) use eligibility traces $\epsilon_l^t$ to remove temporal dependencies, but still requires a global spatial credit assignment based on $E_l^t$. Right: Fully-local learning (e.g., ETLP, OSTTP,TESS and TP), still uses eligibility traces for temporal credit assignment but uses only local information for the spatial assignment, thus satisfying both temporal and spatial locality.
  • Figure 2: Overview of Traces Propagation. Right: Parallel computational paths. In the green path, the input signal $s_0^t$ is propagated through the network, via conventional matrices $W_l$, generating input traces $\epsilon_l^t$ based on spiking activity $s_l^t$. In the purple path, the one-hot encoded target vector $c \in \mathcal{R^C}$ is projected to the first layer via $S \in \mathcal{R}^{C \times H_{1}}$, to match the dimensionality of the first input trace (i.e. $\epsilon_1^t$), thus enabling the comparison of input and target traces through local loss $E_l^t$. For $l>1$, the same matrices $W_l$ are used to propagate the target signal and generate target traces $\tilde{\epsilon}^t_l$ at each layer. Left: Geometric view and trace alignment dynamics. Target traces attracts input traces of the same class (green arrows) and repel those from different classes (red arrows), fostering class separation. At the last layers, the input traces of different classes become linearly separable (dashed line).
  • Figure 3: Google Speech Commands on-device fine-tuning (FT) with Traces Propagation. a) Edge deployment pipeline for speech recognition. A neural network is pre-trained on the Google Speech Command dataset in the cloud (blue circle) and transmitted to the edge device (black arrows). Without fine-tuning, the model can misclassify inputs (e.g. mistaking a "yes" for a "no). If fine-tuning is not available at the edge, the new user recordings must be sent back to cloud (red arrows) for retraining, reducing privacy and increasing cost. With on-device fine-tuning (green circle) the model can adapt to the new user without data transmission. b) Fine tuning accuracy improvement with 1-shot, 5-shot, and all user samples. Before FT represents the accuracy on GSC for the new user following cloud-training and deployment.
  • Figure 4: t-SNE visualization of input traces across layers on the DVS-GESTURE dataset. Panels 1 to 8 show the 2D representation at the end of training of the input traces for each layer of a VGG-9 architecture trained on the DVS-GESTURE dataset. In the bottom-left corner of each panel, we report the Silhouette score shahapure_2020 to quantify the degree of clustering. A value of -1 indicates overlapping or poorly separated clusters, while a value of +1 correspond to more compact and well-separated groups. As we can see, the Silhouette score increases progressively with layer depth, demonstrating that the network develops increasingly discriminative and separable representations.
  • Figure 5: Relative memory cost of TP compared to TESS as a function of the number of output classes and batch size. A relative memory cost of $G$ indicates that TESS uses $G$ times more memory than TP.