The Solution for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition
Sishun Pan, Xixian Wu, Tingmin Li, Longfei Huang, Mingxu Feng, Zhonghua Wan, Yang Yang
TL;DR
This work tackles continual learning for sequential tasks under both task incremental and domain incremental settings without access to historical data. It builds on a data-free parameter isolation framework inspired by Winning SubNetworks to allocate per task subnetworks within a ResNet backbone while freezing BN after the first task and freezing the shared classifier for domain increments. Novel components include a domain task ID inference mechanism, gradient supplementation to bolster learning of new tasks, dynamic subnet sizing to preserve capacity for long sequences, and an efficient mask matrix compression scheme. Empirical results on the competition setup show progressive improvements across ablations, achieving strong performance and a final second-place prize. The approach demonstrates practical, resource-efficient continual learning suitable for domain-shopping scenarios where data replay is unavailable.
Abstract
This paper presents a data-free, parameter-isolation-based continual learning algorithm we developed for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition. The method learns an independent parameter subspace for each task within the network's convolutional and linear layers and freezes the batch normalization layers after the first task. Specifically, for domain incremental setting where all domains share a classification head, we freeze the shared classification head after first task is completed, effectively solving the issue of catastrophic forgetting. Additionally, facing the challenge of domain incremental settings without providing a task identity, we designed an inference task identity strategy, selecting an appropriate mask matrix for each sample. Furthermore, we introduced a gradient supplementation strategy to enhance the importance of unselected parameters for the current task, facilitating learning for new tasks. We also implemented an adaptive importance scoring strategy that dynamically adjusts the amount of parameters to optimize single-task performance while reducing parameter usage. Moreover, considering the limitations of storage space and inference time, we designed a mask matrix compression strategy to save storage space and improve the speed of encryption and decryption of the mask matrix. Our approach does not require expanding the core network or using external auxiliary networks or data, and performs well under both task incremental and domain incremental settings. This solution ultimately won a second-place prize in the competition.
