Table of Contents
Fetching ...

EXACFS -- A CIL Method to mitigate Catastrophic Forgetting

S Balasubramanian, M Sai Subramaniam, Sai Sriram Talasu, Yedu Krishna P, Manepalli Pranav Phanindra Sai, Ravi Mukkamala, Darshan Gera

TL;DR

EXACFS tackles catastrophic forgetting in class incremental learning by introducing class-wise feature significance and exponential aging to selectively preserve informative features. It combines a novel feature distillation loss with a PODNET-style classification loss, using exemplars to keep old representations aligned while learning new classes. Empirical results on CIFAR-100 and ImageNet-100 demonstrate state-of-the-art performance and a robust stability-plasticity trade-off across varying incremental task counts. This work advances continual learning by explicitly modeling feature importance per class and adapting significance over time.

Abstract

Deep neural networks (DNNS) excel at learning from static datasets but struggle with continual learning, where data arrives sequentially. Catastrophic forgetting, the phenomenon of forgetting previously learned knowledge, is a primary challenge. This paper introduces EXponentially Averaged Class-wise Feature Significance (EXACFS) to mitigate this issue in the class incremental learning (CIL) setting. By estimating the significance of model features for each learned class using loss gradients, gradually aging the significance through the incremental tasks and preserving the significant features through a distillation loss, EXACFS effectively balances remembering old knowledge (stability) and learning new knowledge (plasticity). Extensive experiments on CIFAR-100 and ImageNet-100 demonstrate EXACFS's superior performance in preserving stability while acquiring plasticity.

EXACFS -- A CIL Method to mitigate Catastrophic Forgetting

TL;DR

EXACFS tackles catastrophic forgetting in class incremental learning by introducing class-wise feature significance and exponential aging to selectively preserve informative features. It combines a novel feature distillation loss with a PODNET-style classification loss, using exemplars to keep old representations aligned while learning new classes. Empirical results on CIFAR-100 and ImageNet-100 demonstrate state-of-the-art performance and a robust stability-plasticity trade-off across varying incremental task counts. This work advances continual learning by explicitly modeling feature importance per class and adapting significance over time.

Abstract

Deep neural networks (DNNS) excel at learning from static datasets but struggle with continual learning, where data arrives sequentially. Catastrophic forgetting, the phenomenon of forgetting previously learned knowledge, is a primary challenge. This paper introduces EXponentially Averaged Class-wise Feature Significance (EXACFS) to mitigate this issue in the class incremental learning (CIL) setting. By estimating the significance of model features for each learned class using loss gradients, gradually aging the significance through the incremental tasks and preserving the significant features through a distillation loss, EXACFS effectively balances remembering old knowledge (stability) and learning new knowledge (plasticity). Extensive experiments on CIFAR-100 and ImageNet-100 demonstrate EXACFS's superior performance in preserving stability while acquiring plasticity.

Paper Structure

This paper contains 18 sections, 8 equations, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: Schematic of the EXACFS Method. At incremental task $t+1$, input comprises of samples from new classes and a few exemplars from classes of earlier tasks. Class labels are shown by colour coding. A shared but incrementally updated feature extractor extracts features. Features influencing class(es) decision are accordingly colour-coded, with the intensity of colour conveying the level of significance. Colourless features do not influence any input classes. A feature may influence more than one class with different degrees. During training of task $t+1$, along with classifier loss, a distillation loss constraining features of exemplars to be similar to its earlier representation from task $t$ is aggregated to promote stability. Depending on the class the exemplar belongs to, the feature similarity across tasks $t$ and $t+1$ is accordingly weighted by an exponentially averaged feature significance. Refer to section \ref{['sec:EXACFS']} for more details.
  • Figure 2: Test accuracy of $10$ classes from the set of classes used for base training across the incremental tasks.
  • Figure 3: Per-task test accuracies of the model trained on the last incremental task across different memory budgets