Table of Contents
Fetching ...

Learning without Isolation: Pathway Protection for Continual Learning

Zhikang Chen, Abudukelimu Wuerkaixi, Sen Cui, Haoxuan Li, Ding Li, Jingfeng Zhang, Bo Han, Gang Niu, Houfang Liu, Yi Yang, Sifan Yang, Changshui Zhang, Tianling Ren

TL;DR

This work addresses catastrophic forgetting in continual learning by shifting focus from protecting individual parameters to protecting information pathways within a network. It introduces Learning without Isolation (LwI), a data-free framework that fuses models from sequential tasks via graph matching to align activation pathways, using layer-wise similarity objectives (high similarity in shallow layers, low similarity in deep layers) to protect old task knowledge while enabling new task learning through activation sparsity. Empirical results on CIFAR-100 and Tiny-Imagenet with ResNet32/ResNet18 show that LwI outperforms regularization, rehearsal, and architecture-based baselines, with stronger gains as model size and dataset complexity increase; ablations corroborate the importance of final-layer diversification and distillation. The approach offers a privacy-preserving pathway-protection mechanism for continual learning that can adapt to task-agnostic settings and potentially scale to larger models, motivating future work on speeding up graph matching and applying the idea to broader domains such as large language models.

Abstract

Deep networks are prone to catastrophic forgetting during sequential task learning, i.e., losing the knowledge about old tasks upon learning new tasks. To this end, continual learning(CL) has emerged, whose existing methods focus mostly on regulating or protecting the parameters associated with the previous tasks. However, parameter protection is often impractical, since the size of parameters for storing the old-task knowledge increases linearly with the number of tasks, otherwise it is hard to preserve the parameters related to the old-task knowledge. In this work, we bring a dual opinion from neuroscience and physics to CL: in the whole networks, the pathways matter more than the parameters when concerning the knowledge acquired from the old tasks. Following this opinion, we propose a novel CL framework, learning without isolation(LwI), where model fusion is formulated as graph matching and the pathways occupied by the old tasks are protected without being isolated. Thanks to the sparsity of activation channels in a deep network, LwI can adaptively allocate available pathways for a new task, realizing pathway protection and addressing catastrophic forgetting in a parameter-efficient manner. Experiments on popular benchmark datasets demonstrate the superiority of the proposed LwI.

Learning without Isolation: Pathway Protection for Continual Learning

TL;DR

This work addresses catastrophic forgetting in continual learning by shifting focus from protecting individual parameters to protecting information pathways within a network. It introduces Learning without Isolation (LwI), a data-free framework that fuses models from sequential tasks via graph matching to align activation pathways, using layer-wise similarity objectives (high similarity in shallow layers, low similarity in deep layers) to protect old task knowledge while enabling new task learning through activation sparsity. Empirical results on CIFAR-100 and Tiny-Imagenet with ResNet32/ResNet18 show that LwI outperforms regularization, rehearsal, and architecture-based baselines, with stronger gains as model size and dataset complexity increase; ablations corroborate the importance of final-layer diversification and distillation. The approach offers a privacy-preserving pathway-protection mechanism for continual learning that can adapt to task-agnostic settings and potentially scale to larger models, motivating future work on speeding up graph matching and applying the idea to broader domains such as large language models.

Abstract

Deep networks are prone to catastrophic forgetting during sequential task learning, i.e., losing the knowledge about old tasks upon learning new tasks. To this end, continual learning(CL) has emerged, whose existing methods focus mostly on regulating or protecting the parameters associated with the previous tasks. However, parameter protection is often impractical, since the size of parameters for storing the old-task knowledge increases linearly with the number of tasks, otherwise it is hard to preserve the parameters related to the old-task knowledge. In this work, we bring a dual opinion from neuroscience and physics to CL: in the whole networks, the pathways matter more than the parameters when concerning the knowledge acquired from the old tasks. Following this opinion, we propose a novel CL framework, learning without isolation(LwI), where model fusion is formulated as graph matching and the pathways occupied by the old tasks are protected without being isolated. Thanks to the sparsity of activation channels in a deep network, LwI can adaptively allocate available pathways for a new task, realizing pathway protection and addressing catastrophic forgetting in a parameter-efficient manner. Experiments on popular benchmark datasets demonstrate the superiority of the proposed LwI.

Paper Structure

This paper contains 40 sections, 19 equations, 6 figures, 20 tables, 3 algorithms.

Figures (6)

  • Figure 1: Left Figures: The illustrative comparison diagram between our method and the parameter-protective approach depicts the key distinctions in our methodologies. Bottom Right Figure: The performance comparison between our method and the WSN kang2022forget method. Top Right Figure: We showcase the ability of our method to adapt even in task-agnostic scenarios, whereas the parameter-protective approach requires knowledge of task identifiers for effective recognition.
  • Figure 2: Left Figure: A comparison between our approach and LwF li2017learning. The activation values in the last convolution layer of the models are displayed across channels. The channels of the models have been rearranged along the horizontal axis for clearer demonstration. Bottom Right Figure: An explanatory legend for the horizontal axis (channel index) in the left figure. Top Right Figure: A comparative analysis under the condition of task awareness between our method and LwF indicates that our accuracy remains largely unchanged, contrasting with a substantial decline observed in the case of LwF.
  • Figure 3: The overall structure of our proposed LwI algorithm. In the right diagram, we represent the deep network in four parts: L1 corresponds to the input layer, L2 to the shallow layers, L3 to the deeper layers, and L4 to the output layer. The channels in the deep network can be analogous to nodes in a graph, and the connections between channels correspond to the edges in the graph. On the left side, L1 requires no matching operation. L4 only needs to append operations for the output heads of different tasks. L2 matches the channels with maximum similarity. Conversely, L3 undergoes minimization of similarity matching.
  • Figure 4: Task-aware forgetting rates of different methods.
  • Figure 5: The illustration of graph matching. The two graphs to be matched, Graph X and Graph Y, are depicted on the left figure, each annotated with corresponding nodes and partial connections. The diagrams on the right represent the similarity matrices between nodes and between edges.
  • ...and 1 more figures