Table of Contents
Fetching ...

Pareto Continual Learning: Preference-Conditioned Learning and Adaption for Dynamic Stability-Plasticity Trade-off

Song Lai, Zhe Zhao, Fei Zhu, Xi Lin, Qingfu Zhang, Gaofeng Meng

TL;DR

This work addresses the stability-plasticity dilemma in continual learning by reframing it as a multi-objective optimization problem over two losses: $\mathcal{L}_{replay}$ (stability) and $\mathcal{L}_{new}$ (plasticity). It introduces ParetoCL, a preference-conditioned learning framework that jointly learns a set of Pareto-optimal trade-offs via a hypernetwork that generates classifier parameters conditioned on a preference vector $\alpha \in \mathbb{R}^2_+$, enabling dynamic adaptation during inference. The key contributions include formulating ER as MOO to approximate the Pareto front, designing a efficient preference-conditioned model, and demonstrating superior performance across Seq-CIFAR10/100 and Seq-TinyImageNet in online and offline settings, with strong ablations and analysis of dynamic adaptation and buffer-size robustness. The results suggest that objective augmentation through learning multiple preference-conditioned hypotheses improves generalization in non-stationary continual learning environments and supports practical, sample-wise adaptation of stability and plasticity trade-offs.

Abstract

Continual learning aims to learn multiple tasks sequentially. A key challenge in continual learning is balancing between two objectives: retaining knowledge from old tasks (stability) and adapting to new tasks (plasticity). Experience replay methods, which store and replay past data alongside new data, have become a widely adopted approach to mitigate catastrophic forgetting. However, these methods neglect the dynamic nature of the stability-plasticity trade-off and aim to find a fixed and unchanging balance, resulting in suboptimal adaptation during training and inference. In this paper, we propose Pareto Continual Learning (ParetoCL), a novel framework that reformulates the stability-plasticity trade-off in continual learning as a multi-objective optimization (MOO) problem. ParetoCL introduces a preference-conditioned model to efficiently learn a set of Pareto optimal solutions representing different trade-offs and enables dynamic adaptation during inference. From a generalization perspective, ParetoCL can be seen as an objective augmentation approach that learns from different objective combinations of stability and plasticity. Extensive experiments across multiple datasets and settings demonstrate that ParetoCL outperforms state-of-the-art methods and adapts to diverse continual learning scenarios.

Pareto Continual Learning: Preference-Conditioned Learning and Adaption for Dynamic Stability-Plasticity Trade-off

TL;DR

This work addresses the stability-plasticity dilemma in continual learning by reframing it as a multi-objective optimization problem over two losses: (stability) and (plasticity). It introduces ParetoCL, a preference-conditioned learning framework that jointly learns a set of Pareto-optimal trade-offs via a hypernetwork that generates classifier parameters conditioned on a preference vector , enabling dynamic adaptation during inference. The key contributions include formulating ER as MOO to approximate the Pareto front, designing a efficient preference-conditioned model, and demonstrating superior performance across Seq-CIFAR10/100 and Seq-TinyImageNet in online and offline settings, with strong ablations and analysis of dynamic adaptation and buffer-size robustness. The results suggest that objective augmentation through learning multiple preference-conditioned hypotheses improves generalization in non-stationary continual learning environments and supports practical, sample-wise adaptation of stability and plasticity trade-offs.

Abstract

Continual learning aims to learn multiple tasks sequentially. A key challenge in continual learning is balancing between two objectives: retaining knowledge from old tasks (stability) and adapting to new tasks (plasticity). Experience replay methods, which store and replay past data alongside new data, have become a widely adopted approach to mitigate catastrophic forgetting. However, these methods neglect the dynamic nature of the stability-plasticity trade-off and aim to find a fixed and unchanging balance, resulting in suboptimal adaptation during training and inference. In this paper, we propose Pareto Continual Learning (ParetoCL), a novel framework that reformulates the stability-plasticity trade-off in continual learning as a multi-objective optimization (MOO) problem. ParetoCL introduces a preference-conditioned model to efficiently learn a set of Pareto optimal solutions representing different trade-offs and enables dynamic adaptation during inference. From a generalization perspective, ParetoCL can be seen as an objective augmentation approach that learns from different objective combinations of stability and plasticity. Extensive experiments across multiple datasets and settings demonstrate that ParetoCL outperforms state-of-the-art methods and adapts to diverse continual learning scenarios.

Paper Structure

This paper contains 20 sections, 7 equations, 3 figures, 4 tables, 2 algorithms.

Figures (3)

  • Figure 1: Motivation. (left) illustrates the multi-stage training process of a ResNet-18 model on 5 tasks of Seq-CIFAR10 while (right) compares ER-DPS with several SOTA replay-based methods. In (left), each stage builds upon the model trained on the previous task, incorporating different stability-plasticity trade-offs. Each point in (left) corresponds to the accuracy obtained when training the model under a specific preference vector, demonstrating how different trade-offs influence stability and plasticity. The metrics in the graph, $A_{\text{old}}$ and $A_{\text{new}}$, represent the average accuracy on previous tasks and the accuracy on the current task, respectively, corresponding to stability and plasticity. The dashed line indicates the model selected at the current stage corresponding to the chosen preference. Detailed data is provided in supplementary material.
  • Figure 2: An overview of our proposed ParetoCL framework. Our model consists of two parts: a shared encoder and a preference-based classifier (with parameters generated by a hypernetwork). The model takes inputs from the data stream and memory buffer, which are then transformed into embeddings by the encoder. These embeddings are used by the preference-based classifier multiple times with different preferences, each of which is sampled from a prior distribution. The sampled preference is then fed into the hypernetwork to obtain the parameters of the preference-based classifier. For the embeddings from both the data stream and memory buffer, the classifier computes two losses: $\mathcal{L}_{new}$ and $\mathcal{L}_{replay}$. The final optimization objective of the model is the expectation of the preference-based aggregation of these two losses. Through this approach, ParetoCL learns a mapping from the preference space to the objective space of different plasticity-stability trade-offs.
  • Figure 3: Effectiveness of Preference-Conditioned Learning and Dynamic Preference Adaptation. (left) shows the Pareto front approximated by ParetoCL in each stage on Seq-CIFAR10 in the offline setting. The results demonstrate that ParetoCL can effectively explore various stability-plasticity trade-offs; (right) compares the average incremental accuracy of different methods on Seq-CIFAR100 in the online setting. Dynamic Preference Adaptation can effectively improve performance compared to ParetoCL-- and other baselines.

Theorems & Definitions (3)

  • Definition 1: Pareto Dominance
  • Definition 2: Pareto Optimality
  • Definition 3: Pareto Set and Pareto Front