Table of Contents
Fetching ...

Exploring the Stability Gap in Continual Learning: The Role of the Classification Head

Wojciech Łapacz, Daniel Marczak, Filip Szatkowski, Tomasz Trzciński

TL;DR

This work introduces the nearest-mean classifier (NMC) as a tool to attribute the influence of the backbone and the classification head on the stability gap and suggests that the primary contributor to this phenomenon is the linear head, rather than the insufficient representation learning.

Abstract

Continual learning (CL) has emerged as a critical area in machine learning, enabling neural networks to learn from evolving data distributions while mitigating catastrophic forgetting. However, recent research has identified the stability gap -- a phenomenon where models initially lose performance on previously learned tasks before partially recovering during training. Such learning dynamics are contradictory to the intuitive understanding of stability in continual learning where one would expect the performance to degrade gradually instead of rapidly decreasing and then partially recovering later. To better understand and alleviate the stability gap, we investigate it at different levels of the neural network architecture, particularly focusing on the role of the classification head. We introduce the nearest-mean classifier (NMC) as a tool to attribute the influence of the backbone and the classification head on the stability gap. Our experiments demonstrate that NMC not only improves final performance, but also significantly enhances training stability across various continual learning benchmarks, including CIFAR100, ImageNet100, CUB-200, and FGVC Aircrafts. Moreover, we find that NMC also reduces task-recency bias. Our analysis provides new insights into the stability gap and suggests that the primary contributor to this phenomenon is the linear head, rather than the insufficient representation learning.

Exploring the Stability Gap in Continual Learning: The Role of the Classification Head

TL;DR

This work introduces the nearest-mean classifier (NMC) as a tool to attribute the influence of the backbone and the classification head on the stability gap and suggests that the primary contributor to this phenomenon is the linear head, rather than the insufficient representation learning.

Abstract

Continual learning (CL) has emerged as a critical area in machine learning, enabling neural networks to learn from evolving data distributions while mitigating catastrophic forgetting. However, recent research has identified the stability gap -- a phenomenon where models initially lose performance on previously learned tasks before partially recovering during training. Such learning dynamics are contradictory to the intuitive understanding of stability in continual learning where one would expect the performance to degrade gradually instead of rapidly decreasing and then partially recovering later. To better understand and alleviate the stability gap, we investigate it at different levels of the neural network architecture, particularly focusing on the role of the classification head. We introduce the nearest-mean classifier (NMC) as a tool to attribute the influence of the backbone and the classification head on the stability gap. Our experiments demonstrate that NMC not only improves final performance, but also significantly enhances training stability across various continual learning benchmarks, including CIFAR100, ImageNet100, CUB-200, and FGVC Aircrafts. Moreover, we find that NMC also reduces task-recency bias. Our analysis provides new insights into the stability gap and suggests that the primary contributor to this phenomenon is the linear head, rather than the insufficient representation learning.

Paper Structure

This paper contains 23 sections, 6 equations, 20 figures, 5 tables.

Figures (20)

  • Figure 1: Stability gap phenomenon throughout learning the first two tasks from the CIFAR100 5-task split. When evaluated with a standard linear head and Nearest-Mean Classifier (NMC), the NMC performance on the first task with more stable through the learning phase and achieves a better final accuracy, even though both networks use the same representations.
  • Figure 2: Oracle NMC leads to significantly better CL performance than using classification head, exhibiting lower performance drops on task boundaries and higher final accuracy. We report first task accuracy in %.
  • Figure 3: First task accuracy (%) for NMC and linear head on standard continual learning benchmarks. NMC shows higher performance and stability through the training across all the evaluated benchmarks.
  • Figure 4: First task accuracy (%) for regular training (left) and Slow Learner (right) on fine-grained classification benchmarks when starting from pre-trained model. NMC enables better stability in such a setting as well.
  • Figure 5: Average accuracy for FT and NMC over the course of continual learning with different memory budgets for CIFAR100. We present the results with fixed memory size on the left and with the growing memory on the right. NMC outperforms finetuning regardless of the memory selection algorithm.
  • ...and 15 more figures