Table of Contents
Fetching ...

Improving Plasticity in Online Continual Learning via Collaborative Learning

Maorong Wang, Nicolas Michel, Ling Xiao, Toshihiko Yamasaki

TL;DR

This work reframes online Continual Learning by foregrounding model plasticity alongside stability. It introduces Collaborative Continual Learning with Distillation Chain (CCL-DC), a two-peer collaborative framework augmented by a Distillation Chain to regulate prediction entropy, designed to be compatible with existing online CL methods. By defining Learning Accuracy ($LA$) and Relative Forgetting ($RF$) and deriving an approximate relation $AA \gtrapprox LA \times (1 - RF)$, the paper quantifies how plasticity and stability jointly determine final performance. Across four image-classification datasets and multiple baselines, CCL-DC consistently boosts plasticity and final accuracy, with substantial gains that persist across memory budgets, and it offers extensive ablations, visualizations, and implementation details. The approach demonstrates a scalable, effective pathway to improve online CL by leveraging collaborative learning and entropy-regularized distillation.

Abstract

Online Continual Learning (CL) solves the problem of learning the ever-emerging new classification tasks from a continuous data stream. Unlike its offline counterpart, in online CL, the training data can only be seen once. Most existing online CL research regards catastrophic forgetting (i.e., model stability) as almost the only challenge. In this paper, we argue that the model's capability to acquire new knowledge (i.e., model plasticity) is another challenge in online CL. While replay-based strategies have been shown to be effective in alleviating catastrophic forgetting, there is a notable gap in research attention toward improving model plasticity. To this end, we propose Collaborative Continual Learning (CCL), a collaborative learning based strategy to improve the model's capability in acquiring new concepts. Additionally, we introduce Distillation Chain (DC), a collaborative learning scheme to boost the training of the models. We adapt CCL-DC to existing representative online CL works. Extensive experiments demonstrate that even if the learners are well-trained with state-of-the-art online CL methods, our strategy can still improve model plasticity dramatically, and thereby improve the overall performance by a large margin. The source code of our work is available at https://github.com/maorong-wang/CCL-DC.

Improving Plasticity in Online Continual Learning via Collaborative Learning

TL;DR

This work reframes online Continual Learning by foregrounding model plasticity alongside stability. It introduces Collaborative Continual Learning with Distillation Chain (CCL-DC), a two-peer collaborative framework augmented by a Distillation Chain to regulate prediction entropy, designed to be compatible with existing online CL methods. By defining Learning Accuracy () and Relative Forgetting () and deriving an approximate relation , the paper quantifies how plasticity and stability jointly determine final performance. Across four image-classification datasets and multiple baselines, CCL-DC consistently boosts plasticity and final accuracy, with substantial gains that persist across memory budgets, and it offers extensive ablations, visualizations, and implementation details. The approach demonstrates a scalable, effective pathway to improve online CL by leveraging collaborative learning and entropy-regularized distillation.

Abstract

Online Continual Learning (CL) solves the problem of learning the ever-emerging new classification tasks from a continuous data stream. Unlike its offline counterpart, in online CL, the training data can only be seen once. Most existing online CL research regards catastrophic forgetting (i.e., model stability) as almost the only challenge. In this paper, we argue that the model's capability to acquire new knowledge (i.e., model plasticity) is another challenge in online CL. While replay-based strategies have been shown to be effective in alleviating catastrophic forgetting, there is a notable gap in research attention toward improving model plasticity. To this end, we propose Collaborative Continual Learning (CCL), a collaborative learning based strategy to improve the model's capability in acquiring new concepts. Additionally, we introduce Distillation Chain (DC), a collaborative learning scheme to boost the training of the models. We adapt CCL-DC to existing representative online CL works. Extensive experiments demonstrate that even if the learners are well-trained with state-of-the-art online CL methods, our strategy can still improve model plasticity dramatically, and thereby improve the overall performance by a large margin. The source code of our work is available at https://github.com/maorong-wang/CCL-DC.
Paper Structure (47 sections, 11 equations, 12 figures, 12 tables, 1 algorithm)

This paper contains 47 sections, 11 equations, 12 figures, 12 tables, 1 algorithm.

Figures (12)

  • Figure 1: The comparison of plasticity (learning accuracy) and stability (relative forgetting, our metric proposed in Sec. \ref{['sec:tradeoff']}) of Experience Replay (ER) rolnick2019experience under varisous settings on CIFAR-100. For experiments with memory replay, the size of the memory buffer is set to 2,000. We can witness a plasticity gap between offline CL and online CL, even with memory replay and multiple update trick (memory iteration $>$ 1).
  • Figure 2: Overview of the proposed CCL-DC framework applied to a baseline online CL method. The proposed CCL-DC framework has two main components. The first one is CCL, which involves two peer continual learners that simultaneously learn from the data stream in a peer teaching manner. The second component, DC, generates a chain of samples with varying levels of difficulty and feeds them into models to produce a chain of logit distribution of different confidence levels. Then, in a collaborative learning approach, DC conducts distillation from less confident predictions to more confident predictions, to serve as a form of learned entropy regularization.
  • Figure 3: Conceptual diagram of the training framework, when distilling from an untrained network $\theta^2$ to suppress the confidence of network $\theta^1$.
  • Figure 4: Classification loss curve of ER on CIFAR-100 (M=2k). The curve is calculated on all training samples of the current task. Since there are 10 tasks in total, the curve has 10 peaks.
  • Figure 5: The entropy of prediction produced by ER with and without CCL-DC on CIFAR-100 (M=2k). $X_i$ represents the sample after $i$-th augmentation in Eq. \ref{['eq:DC']}. The value is calculated at the end of the training and is averaged over all training samples.
  • ...and 7 more figures