Table of Contents
Fetching ...

Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation

Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao, Enguang Wang, Le Zhang, Xialei Liu

TL;DR

Class Incremental Semantic Segmentation faces catastrophic forgetting and background shift as new classes arrive. NeST introduces a pre-tuning stage that learns a linear transformation from all old classifiers to generate new classifiers, using two matrices $\mathbf{M}_c$ and $\mathbf{P}_c$ so that $\mathbf{w}_{c} = (\mathbf{M}_c \odot \mathbf{W}_{old}) \mathbf{P}_c$, and freezes the backbone during pre-tuning with an unbiased cross-entropy loss. It also initializes these transformation matrices via cross-task class similarity, biasing old-class channels toward relevant new classes to balance stability and plasticity. Extensive experiments on Pascal VOC 2012 and ADE20K across multiple backbones (ResNet-101 and Swin-B) show that NeST consistently improves baseline methods (MiB, PLOP, RCIL), often by double-digit gains for old classes while maintaining strong new-class performance. The approach is lightweight to integrate with existing CISS pipelines and demonstrates significant practical impact for continual segmentation in evolving environments, albeit with modest additional compute during pre-tuning.

Abstract

Class incremental semantic segmentation aims to preserve old knowledge while learning new tasks, however, it is impeded by catastrophic forgetting and background shift issues. Prior works indicate the pivotal importance of initializing new classifiers and mainly focus on transferring knowledge from the background classifier or preparing classifiers for future classes, neglecting the flexibility and variance of new classifiers. In this paper, we propose a new classifier pre-tuning~(NeST) method applied before the formal training process, learning a transformation from old classifiers to generate new classifiers for initialization rather than directly tuning the parameters of new classifiers. Our method can make new classifiers align with the backbone and adapt to the new data, preventing drastic changes in the feature extractor when learning new classes. Besides, we design a strategy considering the cross-task class similarity to initialize matrices used in the transformation, helping achieve the stability-plasticity trade-off. Experiments on Pascal VOC 2012 and ADE20K datasets show that the proposed strategy can significantly improve the performance of previous methods. The code is available at \url{https://github.com/zhengyuan-xie/ECCV24_NeST}.

Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation

TL;DR

Class Incremental Semantic Segmentation faces catastrophic forgetting and background shift as new classes arrive. NeST introduces a pre-tuning stage that learns a linear transformation from all old classifiers to generate new classifiers, using two matrices and so that , and freezes the backbone during pre-tuning with an unbiased cross-entropy loss. It also initializes these transformation matrices via cross-task class similarity, biasing old-class channels toward relevant new classes to balance stability and plasticity. Extensive experiments on Pascal VOC 2012 and ADE20K across multiple backbones (ResNet-101 and Swin-B) show that NeST consistently improves baseline methods (MiB, PLOP, RCIL), often by double-digit gains for old classes while maintaining strong new-class performance. The approach is lightweight to integrate with existing CISS pipelines and demonstrates significant practical impact for continual segmentation in evolving environments, albeit with modest additional compute during pre-tuning.

Abstract

Class incremental semantic segmentation aims to preserve old knowledge while learning new tasks, however, it is impeded by catastrophic forgetting and background shift issues. Prior works indicate the pivotal importance of initializing new classifiers and mainly focus on transferring knowledge from the background classifier or preparing classifiers for future classes, neglecting the flexibility and variance of new classifiers. In this paper, we propose a new classifier pre-tuning~(NeST) method applied before the formal training process, learning a transformation from old classifiers to generate new classifiers for initialization rather than directly tuning the parameters of new classifiers. Our method can make new classifiers align with the backbone and adapt to the new data, preventing drastic changes in the feature extractor when learning new classes. Besides, we design a strategy considering the cross-task class similarity to initialize matrices used in the transformation, helping achieve the stability-plasticity trade-off. Experiments on Pascal VOC 2012 and ADE20K datasets show that the proposed strategy can significantly improve the performance of previous methods. The code is available at \url{https://github.com/zhengyuan-xie/ECCV24_NeST}.
Paper Structure (18 sections, 14 equations, 9 figures, 12 tables, 1 algorithm)

This paper contains 18 sections, 14 equations, 9 figures, 12 tables, 1 algorithm.

Figures (9)

  • Figure 1: Different classifier initialization methods for class incremental semantic segmentation. MiB mib directly uses background classifiers to initialize new classifiers. Some methods ssuldkd train an auxiliary classifier for future classes. AWT awt selects the most relevant weights from the background classifier for new classifiers' initialization by gradient-based attribution. Our new classifier pre-tuning method learns a transformation from all old classifiers to generate new classifiers for initialization.
  • Figure 2: Illustration of our new classifier pre-tuning (NeST) method. The left side of the figure is an iteration of the new classifier pre-tuning process. The right side of the figure represents the cross-task class similarity-based initialization of importance matrices and projection matrices before the pre-tuning process.
  • Figure 3: The mIoU (%) at each step for the setting 15-1 (a) and 10-1 (b).
  • Figure 4: The feature map similarity and training loss on Pascal VOC 2012 15-1 overlapped setting.
  • Figure 5: Qualitative comparisons on Pascal VOC 2012 15-1 overlapped setting.
  • ...and 4 more figures