Table of Contents
Fetching ...

CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds

Vaishnavi Nagabhushana, Kartikay Agrawal, Ayon Borthakur

Abstract

Although deep neural networks perform extremely well in controlled environments, they fail in real-world scenarios where data isn't available all at once, and the model must adapt to a new data distribution that may or may not follow the initial distribution. Previously acquired knowledge is lost during subsequent updates based on new data. a phenomenon commonly known as catastrophic forgetting. In contrast, the brain can learn without such catastrophic forgetting, irrespective of the number of tasks it encounters. Existing spiking neural networks (SNNs) for class-incremental learning (CIL) suffer a sharp performance drop as tasks accumulate. We here introduce CATFormer (Context Adaptive Threshold Transformer), a scalable framework that overcomes this limitation. We observe that the key to preventing forgetting in SNNs lies not only in synaptic plasticity but also in modulating neuronal excitability. At the core of CATFormer is the Dynamic Threshold Leaky Integrate-and-Fire (DTLIF) neuron model, which leverages context-adaptive thresholds as the primary mechanism for knowledge retention. This is paired with a Gated Dynamic Head Selection (G-DHS) mechanism for task-agnostic inference. Extensive evaluation on both static (CIFAR-10/100/Tiny-ImageNet) and neuromorphic (CIFAR10-DVS/SHD) datasets reveals that CATFormer outperforms existing rehearsal-free CIL algorithms across various task splits, establishing it as an ideal architecture for energy-efficient, true-class incremental learning.

CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds

Abstract

Although deep neural networks perform extremely well in controlled environments, they fail in real-world scenarios where data isn't available all at once, and the model must adapt to a new data distribution that may or may not follow the initial distribution. Previously acquired knowledge is lost during subsequent updates based on new data. a phenomenon commonly known as catastrophic forgetting. In contrast, the brain can learn without such catastrophic forgetting, irrespective of the number of tasks it encounters. Existing spiking neural networks (SNNs) for class-incremental learning (CIL) suffer a sharp performance drop as tasks accumulate. We here introduce CATFormer (Context Adaptive Threshold Transformer), a scalable framework that overcomes this limitation. We observe that the key to preventing forgetting in SNNs lies not only in synaptic plasticity but also in modulating neuronal excitability. At the core of CATFormer is the Dynamic Threshold Leaky Integrate-and-Fire (DTLIF) neuron model, which leverages context-adaptive thresholds as the primary mechanism for knowledge retention. This is paired with a Gated Dynamic Head Selection (G-DHS) mechanism for task-agnostic inference. Extensive evaluation on both static (CIFAR-10/100/Tiny-ImageNet) and neuromorphic (CIFAR10-DVS/SHD) datasets reveals that CATFormer outperforms existing rehearsal-free CIL algorithms across various task splits, establishing it as an ideal architecture for energy-efficient, true-class incremental learning.
Paper Structure (24 sections, 3 figures, 3 tables, 2 algorithms)

This paper contains 24 sections, 3 figures, 3 tables, 2 algorithms.

Figures (3)

  • Figure 1: Test performance variation with respect to the progress in the number of trained tasks (for a maximum of 50 tasks). CATFormer (ours) maintains consistent performance with other existing CIL methods when implemented on a Spiking Transformer. All methods are evaluated on CIFAR 100.
  • Figure 2: The diagram depicts the full architecture and workflow of CATFormer.
  • Figure 3: Reverse Forgetting vs Catastrophic Forgetting Trend comparison (No. of Task vs Accuracy( in %)) of CATFormer against DSD-SNN dsdsnn and same SpikFormer on ewczenke2017continualrebuffi2017icarlder++rder. The dotted line represents the average accuracy across tasks as the number of tasks increases.