Learning without Isolation: Pathway Protection for Continual Learning
Zhikang Chen, Abudukelimu Wuerkaixi, Sen Cui, Haoxuan Li, Ding Li, Jingfeng Zhang, Bo Han, Gang Niu, Houfang Liu, Yi Yang, Sifan Yang, Changshui Zhang, Tianling Ren
TL;DR
This work addresses catastrophic forgetting in continual learning by shifting focus from protecting individual parameters to protecting information pathways within a network. It introduces Learning without Isolation (LwI), a data-free framework that fuses models from sequential tasks via graph matching to align activation pathways, using layer-wise similarity objectives (high similarity in shallow layers, low similarity in deep layers) to protect old task knowledge while enabling new task learning through activation sparsity. Empirical results on CIFAR-100 and Tiny-Imagenet with ResNet32/ResNet18 show that LwI outperforms regularization, rehearsal, and architecture-based baselines, with stronger gains as model size and dataset complexity increase; ablations corroborate the importance of final-layer diversification and distillation. The approach offers a privacy-preserving pathway-protection mechanism for continual learning that can adapt to task-agnostic settings and potentially scale to larger models, motivating future work on speeding up graph matching and applying the idea to broader domains such as large language models.
Abstract
Deep networks are prone to catastrophic forgetting during sequential task learning, i.e., losing the knowledge about old tasks upon learning new tasks. To this end, continual learning(CL) has emerged, whose existing methods focus mostly on regulating or protecting the parameters associated with the previous tasks. However, parameter protection is often impractical, since the size of parameters for storing the old-task knowledge increases linearly with the number of tasks, otherwise it is hard to preserve the parameters related to the old-task knowledge. In this work, we bring a dual opinion from neuroscience and physics to CL: in the whole networks, the pathways matter more than the parameters when concerning the knowledge acquired from the old tasks. Following this opinion, we propose a novel CL framework, learning without isolation(LwI), where model fusion is formulated as graph matching and the pathways occupied by the old tasks are protected without being isolated. Thanks to the sparsity of activation channels in a deep network, LwI can adaptively allocate available pathways for a new task, realizing pathway protection and addressing catastrophic forgetting in a parameter-efficient manner. Experiments on popular benchmark datasets demonstrate the superiority of the proposed LwI.
