Revisiting Supervision for Continual Representation Learning

Daniel Marczak; Sebastian Cygert; Tomasz Trzciński; Bartłomiej Twardowski

Revisiting Supervision for Continual Representation Learning

Daniel Marczak, Sebastian Cygert, Tomasz Trzciński, Bartłomiej Twardowski

TL;DR

The paper addresses the paradox that supervised signals can degrade continual representation learning, by showing that adding a simple MLP projector to supervised training yields representations that outperform self-supervised methods in continual finetuning and transfer. It combines SL+MLP with various continual learning strategies and conducts extensive experiments across multiple CIFAR and SVHN sequences and downstream tasks, using k-NN/NMC/CKA metrics and spectral analyses. The key contributions are: (1) empirical demonstration that SL+MLP can surpass SSL in continual settings, (2) demonstration of synergistic gains when coupling SL+MLP with CL methods, (3) in-depth analysis of representation quality, diversity, and stability through forgetting measures, EXC, and eigenvalue spectra, and (4) ablation studies clarifying the critical role of the MLP projector, particularly BN and basic architectural components. The findings suggest that the projection head is a central factor in shaping transferability and diversity of representations in continual learning, with practical implications for designing continual learners that leverage labeled data efficiently.

Abstract

In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, there is a growing interest in unsupervised continual learning, which makes use of the vast amounts of unlabeled data. Recent studies have highlighted the strengths of unsupervised methods, particularly self-supervised learning, in providing robust representations. The improved transferability of those representations built with self-supervised methods is often associated with the role played by the multi-layer perceptron projector. In this work, we depart from this observation and reexamine the role of supervision in continual representation learning. We reckon that additional information, such as human annotations, should not deteriorate the quality of representations. Our findings show that supervised models when enhanced with a multi-layer perceptron head, can outperform self-supervised models in continual representation learning. This highlights the importance of the multi-layer perceptron projector in shaping feature transferability across a sequence of tasks in continual learning. The code is available on github: https://github.com/danielm1405/sl-vs-ssl-cl.

Revisiting Supervision for Continual Representation Learning

TL;DR

Abstract

Paper Structure (23 sections, 13 figures, 8 tables)

This paper contains 23 sections, 13 figures, 8 tables.

Introduction
Related Work
Experimental Setup
Main Results
Continual representation learning
Continual transfer learning
Synergy with CL methods
Analysis
Quality of representations
Spectra of representations
Ablation study
Discussion and limitations
Conclusions
Implementation details
CL strategies
...and 8 more sections

Figures (13)

Figure 1: In a two-task continual learning scenario, supervised learning (SL) results in representations that perform well on the second task but poorly on the first task due to high forgetting. On the other hand, representations trained with self-supervised learning (SSL) have higher first-task performance but they underperform on the second task. We show that simple modifications to supervised learning (SL+MLP) yield representations that are superior on the first task and on par with SL on the second task. We report average task-aware k-NN accuracy on 6 different 2-task combinations of CIFAR100, CIFAR10 and SVHN datasets (3 runs for each scenario).
Figure 2: SL finetuning underperforms compared to SSL. However, when equipped with the MLP projector it consistently outperforms SSL. We report the difference in k-NN accuracy (%) between supervised approaches and SSL.
Figure 3: SL+MLP: (1) achieves strong performance after the initial task compared to SL which indicates that it produces representations that are transferable to the unseen tasks; (2) is the only method that is able to accumulate knowledge learned on a sequence of tasks. We report task-agnostic k-NN accuracy after each task on the whole dataset (notice that yet unseen tasks are also included in the evaluation).
Figure 4: Representations learned by SL+MLP are more transferable than those learned by SL and SSL. They also improve when trained on new tasks. We present the results of the models trained continually on ImageNet/5 and evaluated after each task. We report k-NN accuracy (%) on a set of 8 diverse downstream classification tasks and an average performance.
Figure 5: Singular value spectra of 512-dimensional representation space. Representations learned with SL+MLP (right) exhibit desirable properties from the continual learning point of view: (1) they consist of a more diverse set of features (contrary to SL, left); (2) they improve feature diversity when learning new tasks consistently across all the presented settings. Singular values are ordered descending, are normalized by $\sigma^1$ (the largest singular value) and the scale is logarithmic. Vertical dashed lines denote 95% of the variance explained. Intuitively, it indicates how many relevant independent features the representation contains.
...and 8 more figures

Revisiting Supervision for Continual Representation Learning

TL;DR

Abstract

Revisiting Supervision for Continual Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (13)