Towards stable training of parallel continual learning

Li Yuepan; Fan Lyu; Yuyang Li; Wei Feng; Guangcan Liu; Fanhua Shang

Towards stable training of parallel continual learning

Li Yuepan, Fan Lyu, Yuyang Li, Wei Feng, Guangcan Liu, Fanhua Shang

TL;DR

This work analyzes training instability in Parallel Continual Learning (PCL) and introduces Stable Parallel Continual Learning (SPCL), a dual-strategy framework combining forward-path DBT-based orthogonal regularization for CNN kernels and backward-path gradient decomposition to reduce inter-task gradient conflicts. The authors formalize PCL, identify the condition number of the gradient system as a key stability metric, and propose a practical optimization workflow that maintains orthogonality without sacrificing representational capacity. Empirical results on PS-EMNIST, PS-CIFAR-100, and PS-ImageNet-TINY show that SPCL improves stability and accuracy over state-of-the-art baselines, with ablations highlighting the complementary benefits of gradient and filter orthogonality. The approach advances robust multi-task learning in dynamic, multi-source data environments such as autonomous systems by mitigating forward activations interference and backward gradient conflicts, while opening avenues for adaptive hyperparameter strategies in future work.

Abstract

Parallel Continual Learning (PCL) tasks investigate the training methods for continual learning with multi-source input, where data from different tasks are learned as they arrive. PCL offers high training efficiency and is well-suited for complex multi-source data systems, such as autonomous vehicles equipped with multiple sensors. However, at any time, multiple tasks need to be trained simultaneously, leading to severe training instability in PCL. This instability manifests during both forward and backward propagation, where features are entangled and gradients are conflict. This paper introduces Stable Parallel Continual Learning (SPCL), a novel approach that enhances the training stability of PCL for both forward and backward propagation. For the forward propagation, we apply Doubly-block Toeplit (DBT) Matrix based orthogonality constraints to network parameters to ensure stable and consistent propagation. For the backward propagation, we employ orthogonal decomposition for gradient management stabilizes backpropagation and mitigates gradient conflicts across tasks. By optimizing gradients by ensuring orthogonality and minimizing the condition number, SPCL effectively stabilizing the gradient descent in complex optimization tasks. Experimental results demonstrate that SPCL outperforms state-of-the-art methjods and achieve better training stability.

Towards stable training of parallel continual learning

TL;DR

Abstract

Paper Structure (25 sections, 2 theorems, 17 equations, 13 figures, 2 tables, 2 algorithms)

This paper contains 25 sections, 2 theorems, 17 equations, 13 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Stable Training in PCL
Preliminary: Parallel Continual Learning
Stable Training for PCL in perspective of orthogonality
Method
Overview
Doubly-block Toeplit Matrix based Orthogonal Regularization Method
Structured Orthogonal Regularization Optimization
Algorithm
Experiment
Dataset
Experiment details
Main Results
SPCL learning process
...and 10 more sections

Key Result

Theorem 1

(Preservation of Orthogonality in Linear Transformations.) For a linear transformation $\mathbf{Y} = \mathbf{K} \cdot \mathbf{X}$ in CNNs, where $\mathbf{K}$ is orthogonal ($\mathbf{K}^\top \mathbf{K} = I$), the following properties hold: (1) The norm of the output vector $\mathbf{Y}$ is equal to th

Figures (13)

Figure 1: A toy experiment on CIFAR-10, employing a two-task PCL task. During the training process, one task is trained throughout, and in the line graph, a dashed line indicates the point at which the second task joins the training.
Figure 2: Learning process comparisons on two datasets.
Figure 3: Ablation study on CIFAR-10: This involves two tasks, with the dashed line indicating the start of the second task, depicting the progression of accuracy over time for two tasks under different orthogonalization conditions.
Figure 4: Numerical analysis of gradient matrix on PS-CIFAR-100.
Figure 5: Numerical analysis of CNN convolutional layers after adopting a DBT-based orthogonality regularization method on cifar-100.
...and 8 more figures

Theorems & Definitions (3)

Theorem 1
Definition 1: Condition number of gradient system
Theorem 2

Towards stable training of parallel continual learning

TL;DR

Abstract

Towards stable training of parallel continual learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (3)