Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao, Enguang Wang, Le Zhang, Xialei Liu
TL;DR
Class Incremental Semantic Segmentation faces catastrophic forgetting and background shift as new classes arrive. NeST introduces a pre-tuning stage that learns a linear transformation from all old classifiers to generate new classifiers, using two matrices $\mathbf{M}_c$ and $\mathbf{P}_c$ so that $\mathbf{w}_{c} = (\mathbf{M}_c \odot \mathbf{W}_{old}) \mathbf{P}_c$, and freezes the backbone during pre-tuning with an unbiased cross-entropy loss. It also initializes these transformation matrices via cross-task class similarity, biasing old-class channels toward relevant new classes to balance stability and plasticity. Extensive experiments on Pascal VOC 2012 and ADE20K across multiple backbones (ResNet-101 and Swin-B) show that NeST consistently improves baseline methods (MiB, PLOP, RCIL), often by double-digit gains for old classes while maintaining strong new-class performance. The approach is lightweight to integrate with existing CISS pipelines and demonstrates significant practical impact for continual segmentation in evolving environments, albeit with modest additional compute during pre-tuning.
Abstract
Class incremental semantic segmentation aims to preserve old knowledge while learning new tasks, however, it is impeded by catastrophic forgetting and background shift issues. Prior works indicate the pivotal importance of initializing new classifiers and mainly focus on transferring knowledge from the background classifier or preparing classifiers for future classes, neglecting the flexibility and variance of new classifiers. In this paper, we propose a new classifier pre-tuning~(NeST) method applied before the formal training process, learning a transformation from old classifiers to generate new classifiers for initialization rather than directly tuning the parameters of new classifiers. Our method can make new classifiers align with the backbone and adapt to the new data, preventing drastic changes in the feature extractor when learning new classes. Besides, we design a strategy considering the cross-task class similarity to initialize matrices used in the transformation, helping achieve the stability-plasticity trade-off. Experiments on Pascal VOC 2012 and ADE20K datasets show that the proposed strategy can significantly improve the performance of previous methods. The code is available at \url{https://github.com/zhengyuan-xie/ECCV24_NeST}.
