Towards stable training of parallel continual learning
Li Yuepan, Fan Lyu, Yuyang Li, Wei Feng, Guangcan Liu, Fanhua Shang
TL;DR
This work analyzes training instability in Parallel Continual Learning (PCL) and introduces Stable Parallel Continual Learning (SPCL), a dual-strategy framework combining forward-path DBT-based orthogonal regularization for CNN kernels and backward-path gradient decomposition to reduce inter-task gradient conflicts. The authors formalize PCL, identify the condition number of the gradient system as a key stability metric, and propose a practical optimization workflow that maintains orthogonality without sacrificing representational capacity. Empirical results on PS-EMNIST, PS-CIFAR-100, and PS-ImageNet-TINY show that SPCL improves stability and accuracy over state-of-the-art baselines, with ablations highlighting the complementary benefits of gradient and filter orthogonality. The approach advances robust multi-task learning in dynamic, multi-source data environments such as autonomous systems by mitigating forward activations interference and backward gradient conflicts, while opening avenues for adaptive hyperparameter strategies in future work.
Abstract
Parallel Continual Learning (PCL) tasks investigate the training methods for continual learning with multi-source input, where data from different tasks are learned as they arrive. PCL offers high training efficiency and is well-suited for complex multi-source data systems, such as autonomous vehicles equipped with multiple sensors. However, at any time, multiple tasks need to be trained simultaneously, leading to severe training instability in PCL. This instability manifests during both forward and backward propagation, where features are entangled and gradients are conflict. This paper introduces Stable Parallel Continual Learning (SPCL), a novel approach that enhances the training stability of PCL for both forward and backward propagation. For the forward propagation, we apply Doubly-block Toeplit (DBT) Matrix based orthogonality constraints to network parameters to ensure stable and consistent propagation. For the backward propagation, we employ orthogonal decomposition for gradient management stabilizes backpropagation and mitigates gradient conflicts across tasks. By optimizing gradients by ensuring orthogonality and minimizing the condition number, SPCL effectively stabilizing the gradient descent in complex optimization tasks. Experimental results demonstrate that SPCL outperforms state-of-the-art methjods and achieve better training stability.
