The Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous Tasks

Sandesh Kamath; Albin Soutif-Cormerais; Joost van de Weijer; Bogdan Raducanu

The Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous Tasks

Sandesh Kamath, Albin Soutif-Cormerais, Joost van de Weijer, Bogdan Raducanu

TL;DR

The paper demonstrates that the stability gap, a transient drop in performance on previously learned tasks at the start of a new task, also occurs during joint incremental learning on homogeneous task distributions. It shows there exists a low-loss linear path between task minima, defined by $ heta_{ extλ} = extλ heta_1 + (1- extλ) heta_2$, but SGD does not follow this path and initially traverses higher-loss regions; mini-batch analysis reveals per-batch improvements do not translate into better test performance. The findings hold across architectures and data splits, and removing rehearsal further amplifies the gap, highlighting that optimization dynamics—not just data or task heterogeneity—drive the phenomenon. This points to focusing on optimization strategies and path-aware training approaches to mitigate the stability gap in practical continual learning systems.

Abstract

Recent research identified a temporary performance drop on previously learned tasks when transitioning to a new one. This drop is called the stability gap and has great consequences for continual learning: it complicates the direct employment of continually learning since the worse-case performance at task-boundaries is dramatic, it limits its potential as an energy-efficient training paradigm, and finally, the stability drop could result in a reduced final performance of the algorithm. In this paper, we show that the stability gap also occurs when applying joint incremental training of homogeneous tasks. In this scenario, the learner continues training on the same data distribution and has access to all data from previous tasks. In addition, we show that in this scenario, there exists a low-loss linear path to the next minima, but that SGD optimization does not choose this path. We perform further analysis including a finer batch-wise analysis which could provide insights towards potential solution directions.

The Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous Tasks

TL;DR

, but SGD does not follow this path and initially traverses higher-loss regions; mini-batch analysis reveals per-batch improvements do not translate into better test performance. The findings hold across architectures and data splits, and removing rehearsal further amplifies the gap, highlighting that optimization dynamics—not just data or task heterogeneity—drive the phenomenon. This points to focusing on optimization strategies and path-aware training approaches to mitigate the stability gap in practical continual learning systems.

Abstract

Paper Structure (8 sections, 8 figures, 1 table)

This paper contains 8 sections, 8 figures, 1 table.

Introduction
Stability Gap Analysis
Experimental setup
Stability Gap in Joint Incremental Learning of Homogeneous Tasks
Linear Mode Connectivity
Additional Analysis
Conclusions
Acknowledgement.

Figures (8)

Figure 1: Occurrence of the stability gap in joint incremental learning with homogeneous tasks in the $50\text{-}50^*$ setting on (left) CIFAR-10 and (right) CIFAR-100 datasets on a ResNet-18 model. This plot starts after training with task A, and the x-axis represents the number of iterations of training on task B.
Figure 2: In the 50-50* setting, we present the loss path with SGD and the linear connectivity loss path between the warm-start and final models using with ResNet-18 model on (left) CIFAR-10, (right) CIFAR-100 dataset. In order to observe the stability gap, we zoom in on the first few iterations of the new task.
Figure 3: Using CIFAR-100 with ResNet-18, we present the finer analysis of the local improvement obtained at the batch level by observing the train accuracy per batch before (blue line) and after (red line) SGD update is applied for the batch in the 50-50* setting. The black line is the corresponding test accuracy.
Figure 4: Using CIFAR-100 with VGG-16, stability gap in (left) 50-50* (right) 75-25* setting.
Figure 5: Using CIFAR-100 with ResNet-18, stability gap in (left) 10-90* (right) 75-25* setting. We can see that the stability gap increases for a smaller-sized first task.
...and 3 more figures

The Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous Tasks

TL;DR

Abstract

The Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (8)