Table of Contents
Fetching ...

The Solution for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition

Sishun Pan, Xixian Wu, Tingmin Li, Longfei Huang, Mingxu Feng, Zhonghua Wan, Yang Yang

TL;DR

This work tackles continual learning for sequential tasks under both task incremental and domain incremental settings without access to historical data. It builds on a data-free parameter isolation framework inspired by Winning SubNetworks to allocate per task subnetworks within a ResNet backbone while freezing BN after the first task and freezing the shared classifier for domain increments. Novel components include a domain task ID inference mechanism, gradient supplementation to bolster learning of new tasks, dynamic subnet sizing to preserve capacity for long sequences, and an efficient mask matrix compression scheme. Empirical results on the competition setup show progressive improvements across ablations, achieving strong performance and a final second-place prize. The approach demonstrates practical, resource-efficient continual learning suitable for domain-shopping scenarios where data replay is unavailable.

Abstract

This paper presents a data-free, parameter-isolation-based continual learning algorithm we developed for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition. The method learns an independent parameter subspace for each task within the network's convolutional and linear layers and freezes the batch normalization layers after the first task. Specifically, for domain incremental setting where all domains share a classification head, we freeze the shared classification head after first task is completed, effectively solving the issue of catastrophic forgetting. Additionally, facing the challenge of domain incremental settings without providing a task identity, we designed an inference task identity strategy, selecting an appropriate mask matrix for each sample. Furthermore, we introduced a gradient supplementation strategy to enhance the importance of unselected parameters for the current task, facilitating learning for new tasks. We also implemented an adaptive importance scoring strategy that dynamically adjusts the amount of parameters to optimize single-task performance while reducing parameter usage. Moreover, considering the limitations of storage space and inference time, we designed a mask matrix compression strategy to save storage space and improve the speed of encryption and decryption of the mask matrix. Our approach does not require expanding the core network or using external auxiliary networks or data, and performs well under both task incremental and domain incremental settings. This solution ultimately won a second-place prize in the competition.

The Solution for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition

TL;DR

This work tackles continual learning for sequential tasks under both task incremental and domain incremental settings without access to historical data. It builds on a data-free parameter isolation framework inspired by Winning SubNetworks to allocate per task subnetworks within a ResNet backbone while freezing BN after the first task and freezing the shared classifier for domain increments. Novel components include a domain task ID inference mechanism, gradient supplementation to bolster learning of new tasks, dynamic subnet sizing to preserve capacity for long sequences, and an efficient mask matrix compression scheme. Empirical results on the competition setup show progressive improvements across ablations, achieving strong performance and a final second-place prize. The approach demonstrates practical, resource-efficient continual learning suitable for domain-shopping scenarios where data replay is unavailable.

Abstract

This paper presents a data-free, parameter-isolation-based continual learning algorithm we developed for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition. The method learns an independent parameter subspace for each task within the network's convolutional and linear layers and freezes the batch normalization layers after the first task. Specifically, for domain incremental setting where all domains share a classification head, we freeze the shared classification head after first task is completed, effectively solving the issue of catastrophic forgetting. Additionally, facing the challenge of domain incremental settings without providing a task identity, we designed an inference task identity strategy, selecting an appropriate mask matrix for each sample. Furthermore, we introduced a gradient supplementation strategy to enhance the importance of unselected parameters for the current task, facilitating learning for new tasks. We also implemented an adaptive importance scoring strategy that dynamically adjusts the amount of parameters to optimize single-task performance while reducing parameter usage. Moreover, considering the limitations of storage space and inference time, we designed a mask matrix compression strategy to save storage space and improve the speed of encryption and decryption of the mask matrix. Our approach does not require expanding the core network or using external auxiliary networks or data, and performs well under both task incremental and domain incremental settings. This solution ultimately won a second-place prize in the competition.
Paper Structure (13 sections, 5 equations, 2 figures, 1 table)

This paper contains 13 sections, 5 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Examples of different incremental learning tasks. The left column shows a sequence of tasks in the Task Incremental setting, including examples for the test of historical task $D_{i-1}$ and the current task $D_i$. The right column displays examples from various domains in the Domain Incremental setting, where the task ID is unknown during testing.
  • Figure 2: Overview of the system architecture. (a) We train the feature extractor and the task classifier $k$ at task $k$. (b) Transformer and adapter module with specific masks.