Table of Contents
Fetching ...

S2IL: Structurally Stable Incremental Learning

S Balasubramanian, Yedu Krishna P, Talasu Sai Sriram, M Sai Subramaniam, Manepalli Pranav Phanindra Sai, Darshan Gera

TL;DR

The paper addresses catastrophic forgetting in class-incremental learning by moving beyond exact feature matching to preserve structural relationships within features. It introduces Structurally Stable Incremental Learning ($S^2IL$), which applies an SSIM-based distillation loss to the last convolutional layer and combines it with standard classification loss to balance stability and plasticity. Across CIFAR-100, ImageNet-100, and ImageNet-1K, $S^2IL$ achieves state-of-the-art incremental accuracy, particularly in settings with many incremental tasks, and ablations confirm the importance of structure-focused distillation, last-layer emphasis, and a memory strategy that allocates a fixed total exemplar budget. Overall, $S^2IL$ demonstrates that preserving spatial structure rather than exact feature alignment yields robust long-term performance in CIL with practical memory considerations.

Abstract

Feature Distillation (FD) strategies are proven to be effective in mitigating Catastrophic Forgetting (CF) seen in Class Incremental Learning (CIL). However, current FD approaches enforce strict alignment of feature magnitudes and directions across incremental steps, limiting the model's ability to adapt to new knowledge. In this paper we propose Structurally Stable Incremental Learning(S22IL), a FD method for CIL that mitigates CF by focusing on preserving the overall spatial patterns of features which promote flexible (plasticity) yet stable representations that preserve old knowledge (stability). We also demonstrate that our proposed method S2IL achieves strong incremental accuracy and outperforms other FD methods on SOTA benchmark datasets CIFAR-100, ImageNet-100 and ImageNet-1K. Notably, S2IL outperforms other methods by a significant margin in scenarios that have a large number of incremental tasks.

S2IL: Structurally Stable Incremental Learning

TL;DR

The paper addresses catastrophic forgetting in class-incremental learning by moving beyond exact feature matching to preserve structural relationships within features. It introduces Structurally Stable Incremental Learning (), which applies an SSIM-based distillation loss to the last convolutional layer and combines it with standard classification loss to balance stability and plasticity. Across CIFAR-100, ImageNet-100, and ImageNet-1K, achieves state-of-the-art incremental accuracy, particularly in settings with many incremental tasks, and ablations confirm the importance of structure-focused distillation, last-layer emphasis, and a memory strategy that allocates a fixed total exemplar budget. Overall, demonstrates that preserving spatial structure rather than exact feature alignment yields robust long-term performance in CIL with practical memory considerations.

Abstract

Feature Distillation (FD) strategies are proven to be effective in mitigating Catastrophic Forgetting (CF) seen in Class Incremental Learning (CIL). However, current FD approaches enforce strict alignment of feature magnitudes and directions across incremental steps, limiting the model's ability to adapt to new knowledge. In this paper we propose Structurally Stable Incremental Learning(S22IL), a FD method for CIL that mitigates CF by focusing on preserving the overall spatial patterns of features which promote flexible (plasticity) yet stable representations that preserve old knowledge (stability). We also demonstrate that our proposed method S2IL achieves strong incremental accuracy and outperforms other FD methods on SOTA benchmark datasets CIFAR-100, ImageNet-100 and ImageNet-1K. Notably, S2IL outperforms other methods by a significant margin in scenarios that have a large number of incremental tasks.

Paper Structure

This paper contains 20 sections, 8 equations, 3 figures, 9 tables, 1 algorithm.

Figures (3)

  • Figure 1: A motivation for exploring structure based feature distillation (FD): (a) Average feature importance $\rho$ across increments from the last convolutional layer of two SOTA FD models, EXACFS and AFC, evaluated on CIFAR100 with a Inc 10 setting. Both enforce feature similarity (magnitude and direction) between increments. A feature is deemed important if it significantly influences the loss. Surprisingly, $\rho$ remains nearly constant at $1$ for all the features, for both models, limiting plasticity by enforcing feature similarity of all corresponding features equally. This highlights the need for a FD idea that balances stability with plasticity. (b) Heatmap comparing class-wise deviations in Grad-CAM feature importances of various models from those of the Oracle model $O$. $O$ is trained like any other CIL model except that it has access to all past and current train data. $S^2IL$ shows significantly lower deviation from $O$, suggesting that accounting for feature structure results in better generalization.
  • Figure 2: Figure 1a from the manuscript extended for other incremental settings on the CIFAR-100 dataset. The importance values computed by AFC [22] and EXACFS [4] are predominantly flattened to 1.
  • Figure 3: Boxplot of deviations in Grad-CAM feature map importance values of various models compared to the Oracle model on the CIFAR-100 dataset