Revisiting Supervision for Continual Representation Learning
Daniel Marczak, Sebastian Cygert, Tomasz Trzciński, Bartłomiej Twardowski
TL;DR
The paper addresses the paradox that supervised signals can degrade continual representation learning, by showing that adding a simple MLP projector to supervised training yields representations that outperform self-supervised methods in continual finetuning and transfer. It combines SL+MLP with various continual learning strategies and conducts extensive experiments across multiple CIFAR and SVHN sequences and downstream tasks, using k-NN/NMC/CKA metrics and spectral analyses. The key contributions are: (1) empirical demonstration that SL+MLP can surpass SSL in continual settings, (2) demonstration of synergistic gains when coupling SL+MLP with CL methods, (3) in-depth analysis of representation quality, diversity, and stability through forgetting measures, EXC, and eigenvalue spectra, and (4) ablation studies clarifying the critical role of the MLP projector, particularly BN and basic architectural components. The findings suggest that the projection head is a central factor in shaping transferability and diversity of representations in continual learning, with practical implications for designing continual learners that leverage labeled data efficiently.
Abstract
In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, there is a growing interest in unsupervised continual learning, which makes use of the vast amounts of unlabeled data. Recent studies have highlighted the strengths of unsupervised methods, particularly self-supervised learning, in providing robust representations. The improved transferability of those representations built with self-supervised methods is often associated with the role played by the multi-layer perceptron projector. In this work, we depart from this observation and reexamine the role of supervision in continual representation learning. We reckon that additional information, such as human annotations, should not deteriorate the quality of representations. Our findings show that supervised models when enhanced with a multi-layer perceptron head, can outperform self-supervised models in continual representation learning. This highlights the importance of the multi-layer perceptron projector in shaping feature transferability across a sequence of tasks in continual learning. The code is available on github: https://github.com/danielm1405/sl-vs-ssl-cl.
