Kaizen: Practical Self-supervised Continual Learning with Continual Fine-tuning

Chi Ian Tang; Lorena Qendro; Dimitris Spathis; Fahim Kawsar; Cecilia Mascolo; Akhil Mathur

Kaizen: Practical Self-supervised Continual Learning with Continual Fine-tuning

Chi Ian Tang, Lorena Qendro, Dimitris Spathis, Fahim Kawsar, Cecilia Mascolo, Akhil Mathur

TL;DR

Kaizen addresses the practical challenge of continual learning with self-supervised representations by jointly training feature extractors and classifiers across a stream of unlabeled and labeled data. It introduces a four-term loss that combines knowledge distillation for both components with current-task self-supervised and supervised learning, augmented by memory replay. Across CIFAR-100 and ImageNet-100 benchmarks and multiple SSL backbones, Kaizen significantly improves continual and final accuracy while reducing forgetting, demonstrating robust performance over time and under longer task sequences. The framework offers flexible deployment, balancing knowledge retention with the ability to learn from new data, and demonstrates practical potential for real-world continual learning systems.

Abstract

Self-supervised learning (SSL) has shown remarkable performance in computer vision tasks when trained offline. However, in a Continual Learning (CL) scenario where new data is introduced progressively, models still suffer from catastrophic forgetting. Retraining a model from scratch to adapt to newly generated data is time-consuming and inefficient. Previous approaches suggested re-purposing self-supervised objectives with knowledge distillation to mitigate forgetting across tasks, assuming that labels from all tasks are available during fine-tuning. In this paper, we generalize self-supervised continual learning in a practical setting where available labels can be leveraged in any step of the SSL process. With an increasing number of continual tasks, this offers more flexibility in the pre-training and fine-tuning phases. With Kaizen, we introduce a training architecture that is able to mitigate catastrophic forgetting for both the feature extractor and classifier with a carefully designed loss function. By using a set of comprehensive evaluation metrics reflecting different aspects of continual learning, we demonstrated that Kaizen significantly outperforms previous SSL models in competitive vision benchmarks, with up to 16.5% accuracy improvement on split CIFAR-100. Kaizen is able to balance the trade-off between knowledge retention and learning from new data with an end-to-end model, paving the way for practical deployment of continual learning systems.

Kaizen: Practical Self-supervised Continual Learning with Continual Fine-tuning

TL;DR

Abstract

Paper Structure (21 sections, 1 equation, 11 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 1 equation, 11 figures, 1 table, 1 algorithm.

Introduction
Related Work
Method
Background
Kaizen: Practical continual learner that balances self-supervised learning and fine-tuning
Evaluation for Continual Learning
Self-supervised Continual Learning vs Continual Fine-tuning
Evaluation metrics
Experimental setup
Results
Performance comparison against CSSL
Performance variation across time
Longer continual learning scenarios
Per-task performance breakdown
Comprehensive evaluation of continual learning and SSL methods
...and 6 more sections

Figures (11)

Figure 1: Self-supervised Continual Learning vs Continual Fine-tuning. Existing approaches (a) wait until the end of the continual process to fine-tune, Kaizen (b) leverages distillation across both the feature extraction and fine-tuning steps for each task.
Figure 2: Performance comparison. Models trained using different self-supervised learning methods and knowledge distillation strategies on class-incremental CIFAR-100. The top figure shows the average performance across the entire continual learning process, while the bottom figure shows the performance in the final evaluation.
Figure 3: Overview of the Kaizen framework. Kaizen balances knowledge distillation and current-task learning in an end-to-end manner through a joint loss function. Available SSL methods can be used in training the feature extractors alongside knowledge distillation, while the classifiers are trained on both unlabelled and labelled data through knowledge distillation and fine-tuning.
Figure 4: Calculation of the evaluation metrics. Illustration of our practical evaluation setup with regards to metrics and tasks used in each calculation.
Figure 5: Average performance over tasks on CIFAR-100. Comparison between Kaizen and baselines using 4 SSL algorithms and 5 tasks. Our model consistently outperforms baselines and is robust to forgetting in later tasks.
...and 6 more figures

Kaizen: Practical Self-supervised Continual Learning with Continual Fine-tuning

TL;DR

Abstract

Kaizen: Practical Self-supervised Continual Learning with Continual Fine-tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)