Table of Contents
Fetching ...

Selective Attention-based Modulation for Continual Learning

Giovanni Bellitto, Federica Proietto Salanitri, Matteo Pennisi, Matteo Boschini, Angelo Porrello, Simone Calderara, Simone Palazzo, Concetto Spampinato

TL;DR

The paper tackles catastrophic forgetting in online continual learning by introducing SAM, a biologically-inspired selective attention mechanism that modulates a classification network with a saliency-prediction branch. It uses a two-branch architecture with a shared-alignment saliency encoder and a multiplicative feature modulation, optimized with a combined loss $\mathcal{L}=\mathcal{L}_s + \lambda\mathcal{L}_c$ and with gradients stopped from the classifier loss to the saliency encoder. Experiments on Split Mini-ImageNet and Split FG-ImageNet show SAM consistently boosts performance of state-of-the-art online CL methods (up to ~20 percentage points) and enhances robustness to spurious features and adversarial perturbations. The results support the neuro-inspired view that attention mechanisms can be leveraged to preserve past knowledge while efficiently learning new tasks, and point to extensions to heterogeneous architectures and broader low-level vision tasks.

Abstract

We present SAM, a biologically-plausible selective attention-driven modulation approach to enhance classification models in a continual learning setting. Inspired by neurophysiological evidence that the primary visual cortex does not contribute to object manifold untangling for categorization and that primordial attention biases are still embedded in the modern brain, we propose to employ auxiliary saliency prediction features as a modulation signal to drive and stabilize the learning of a sequence of non-i.i.d. classification tasks. Experimental results confirm that SAM effectively enhances the performance (in some cases up to about twenty percent points) of state-of-the-art continual learning methods, both in class-incremental and task-incremental settings. Moreover, we show that attention-based modulation successfully encourages the learning of features that are more robust to the presence of spurious features and to adversarial attacks than baseline methods. Code is available at: https://github.com/perceivelab/SAM.

Selective Attention-based Modulation for Continual Learning

TL;DR

The paper tackles catastrophic forgetting in online continual learning by introducing SAM, a biologically-inspired selective attention mechanism that modulates a classification network with a saliency-prediction branch. It uses a two-branch architecture with a shared-alignment saliency encoder and a multiplicative feature modulation, optimized with a combined loss and with gradients stopped from the classifier loss to the saliency encoder. Experiments on Split Mini-ImageNet and Split FG-ImageNet show SAM consistently boosts performance of state-of-the-art online CL methods (up to ~20 percentage points) and enhances robustness to spurious features and adversarial perturbations. The results support the neuro-inspired view that attention mechanisms can be leveraged to preserve past knowledge while efficiently learning new tasks, and point to extensions to heterogeneous architectures and broader low-level vision tasks.

Abstract

We present SAM, a biologically-plausible selective attention-driven modulation approach to enhance classification models in a continual learning setting. Inspired by neurophysiological evidence that the primary visual cortex does not contribute to object manifold untangling for categorization and that primordial attention biases are still embedded in the modern brain, we propose to employ auxiliary saliency prediction features as a modulation signal to drive and stabilize the learning of a sequence of non-i.i.d. classification tasks. Experimental results confirm that SAM effectively enhances the performance (in some cases up to about twenty percent points) of state-of-the-art continual learning methods, both in class-incremental and task-incremental settings. Moreover, we show that attention-based modulation successfully encourages the learning of features that are more robust to the presence of spurious features and to adversarial attacks than baseline methods. Code is available at: https://github.com/perceivelab/SAM.
Paper Structure (12 sections, 4 equations, 6 figures, 7 tables)

This paper contains 12 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Comparison between the forgetting-free behavior of saliency prediction and the typical catastrophic forgetting observed on classification tasks in continual learning scenarios. Saliency accuracy (measured as similaritybylinskii2018different) improves as the saliency network is presented with more tasks, while classification accuracy drops. This suggests that saliency detection is an i.i.d. task even in presence of a non-i.i.d. data distribution. Images on the $x$ axis show how predicted saliency maps are approximately constant over tasks.
  • Figure 2: Architecture of the proposed selective attention-based modulation (SAM) strategy. The classification backbone is paired with a saliency prediction network that, given its capability of being forgetting-free, aims at adjusting the learned classification features in order to mitigate overall forgetting.
  • Figure 3: Saliency prediction accuracy, measured in terms of Sim, CC and KLD metrics, in continual learning settings on the on Split miniImagenet and Split FG-Imagenet benchmarks.
  • Figure 4: Qualitative comparison of attribution maps computed through GradCAM (first row) and the saliency maps produced by the saliency predictor $S$ (second row) during a continual training on a sequence of 20 tasks. GradCAM attributions maps show significant forgetting, while saliency maps tend steadily to improve while training.
  • Figure 5: Comparison of SAM to alternative saliency integration strategies. SIM modulates input images by saliency maps. SAI provides saliency maps as an additional input channel to the classification network. LSM merges classification and saliency features through a learnable convolutional layer.
  • ...and 1 more figures