Overcoming Growth-Induced Forgetting in Task-Agnostic Continual Learning
Yuqing Zhao, Jiannong Cao, Divya Saxena, Xiaoyun Liu, Changlin Song, Bo Yuan, Julie McCann
TL;DR
Growth of model capacity in task-agnostic continual learning can trigger forgetting when the entire grown model is used for inference. The authors identify growth-induced forgetting and show existing growth strategies differ in forgetting risks, with layer expansion offering a path to reduce forgetting. They propose SparseGrow, combining layer expansion, gradient gating, and sparse training/initialization to enable targeted updates and controlled plasticity. Extensive experiments across domain- and class-incremental datasets demonstrate SparseGrow achieves high adaptability while minimizing forgetting, outperforming baselines with modest parameter overhead.
Abstract
In continual learning (CL), model growth enhances adaptability to new data. However, when model growth is applied improperly, especially in task-agnostic CL, where the entire grown model is used for inference, it can lead to severe degradation of learned knowledge, a problem we term growth-induced forgetting. Most existing methods that adopt model growth to improve adaptability often overlook the forgetting issue, resulting in compromised knowledge retention, making them unsuitable for task-agnostic settings. To promote both adaptability and knowledge retention with model growth, we identify the key: gradient and parameter sparsity. Introducing SparseGrow, which increases gradient sparsity through layer expansion and gradient gating to enable focused updates on parameters while preserving critical parameters, thus inhibiting forgetting. Moreover, it promotes parameter sparsity with sparse initialization and training, aiming at better control of model plasticity, improving adaptability over new data. Extensive experiments across diverse datasets, task-agnostic settings, and a large number of tasks demonstrate the necessity of controlled layer expansion and validate the effectiveness of SparseGrow in achieving high adaptability while minimizing forgetting in continual learning. By enabling model growth with sparsified gradients and parameters, SparseGrow paves the way for building scalable lifelong learning systems capable of continual adaptation with better knowledge retention.
