Table of Contents
Fetching ...

Mitigating Background Shift in Class-Incremental Semantic Segmentation

Gilhan Park, WonJun Moon, SuBeen Lee, Tae-Young Kim, Jae-Pil Heo

TL;DR

This work tackles background shift in Class-Incremental Semantic Segmentation (CISS) by introducing a background-class separation framework. It combines selective pseudo-labeling to ignore unreliable old-class regions, adaptive feature distillation that weights knowledge transfer by patch reliability, and a separation strategy that decouples background from new classes via label-guided distillation and an orthogonality constraint on class tokens. The approach yields state-of-the-art performance on Pascal VOC and ADE20k under both disjoint and overlapped continual-learning settings, with strong evidence of improved stability (retaining old classes) and plasticity (learning new classes). The contributions offer practical improvements for continual semantic segmentation in dynamic environments where new concepts emerge over time.

Abstract

Class-Incremental Semantic Segmentation(CISS) aims to learn new classes without forgetting the old ones, using only the labels of the new classes. To achieve this, two popular strategies are employed: 1) pseudo-labeling and knowledge distillation to preserve prior knowledge; and 2) background weight transfer, which leverages the broad coverage of background in learning new classes by transferring background weight to the new class classifier. However, the first strategy heavily relies on the old model in detecting old classes while undetected pixels are regarded as the background, thereby leading to the background shift towards the old classes(i.e., misclassification of old class as background). Additionally, in the case of the second approach, initializing the new class classifier with background knowledge triggers a similar background shift issue, but towards the new classes. To address these issues, we propose a background-class separation framework for CISS. To begin with, selective pseudo-labeling and adaptive feature distillation are to distill only trustworthy past knowledge. On the other hand, we encourage the separation between the background and new classes with a novel orthogonal objective along with label-guided output distillation. Our state-of-the-art results validate the effectiveness of these proposed methods.

Mitigating Background Shift in Class-Incremental Semantic Segmentation

TL;DR

This work tackles background shift in Class-Incremental Semantic Segmentation (CISS) by introducing a background-class separation framework. It combines selective pseudo-labeling to ignore unreliable old-class regions, adaptive feature distillation that weights knowledge transfer by patch reliability, and a separation strategy that decouples background from new classes via label-guided distillation and an orthogonality constraint on class tokens. The approach yields state-of-the-art performance on Pascal VOC and ADE20k under both disjoint and overlapped continual-learning settings, with strong evidence of improved stability (retaining old classes) and plasticity (learning new classes). The contributions offer practical improvements for continual semantic segmentation in dynamic environments where new concepts emerge over time.

Abstract

Class-Incremental Semantic Segmentation(CISS) aims to learn new classes without forgetting the old ones, using only the labels of the new classes. To achieve this, two popular strategies are employed: 1) pseudo-labeling and knowledge distillation to preserve prior knowledge; and 2) background weight transfer, which leverages the broad coverage of background in learning new classes by transferring background weight to the new class classifier. However, the first strategy heavily relies on the old model in detecting old classes while undetected pixels are regarded as the background, thereby leading to the background shift towards the old classes(i.e., misclassification of old class as background). Additionally, in the case of the second approach, initializing the new class classifier with background knowledge triggers a similar background shift issue, but towards the new classes. To address these issues, we propose a background-class separation framework for CISS. To begin with, selective pseudo-labeling and adaptive feature distillation are to distill only trustworthy past knowledge. On the other hand, we encourage the separation between the background and new classes with a novel orthogonal objective along with label-guided output distillation. Our state-of-the-art results validate the effectiveness of these proposed methods.
Paper Structure (36 sections, 11 equations, 8 figures, 9 tables)

This paper contains 36 sections, 11 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: (a) Pseudo-labeling is used to learn unlabeled old classes in the image based on the prediction of the old model, while knowledge distillation is to retain the intermediate knowledge of old classes by minimizing the difference between features from the old and new models. (b) However, typical models hardly recognize all pixels precisely. Therefore, the ambiguous pixels (with low prediction confidences from old model) are labeled as the background, i.e., chair and sofa in Case 1 and horse in Case 2, causing the background shift towards old classes (i.e., misclassification of old classes as background). In contrast, our method alleviates the background shift towards old classes by ignoring these ambiguous pixels. (c) On the other hand, background weight transfer leverages the broad category coverage of the background class by initializing the new class token with the background token parameters. (d) Despite its advantages, the baseline model faces challenges in clearly distinguishing new class tokens from the background ones (i.e., background shift towards new classes). Conversely, our method demonstrates improved separation of these classes while preserving the benefits of background weight transfer.
  • Figure 2: Overview of our background-class separation framework for CISS. Given a new class 'car' (grey) at step $t$ (purple), a new class token is added to $[cls]^t$ where the token weight is initially duplicated from the background class. (Blue) Initially, the image is fed into the old model from the previous step $t-1$ to generate the prediction $S^{t-1}$, (green) which is then used to calculate the object identifier $O^t$ and the pseudo-label $\tilde{y}^t$. Subsequently, these are combined to produce the selective pseudo-label map $\bar{y}^t$ to train the new model at step $t$. (yellow) Along with the selective pseudo-labeling, we further calibrate the degree of distilling the old knowledge based on patch-wise reliability, i.e., prediction confidence, through adaptive feature distillation $\mathcal{L}_\text{AFD}$. Briefly, the degree for each patch to be distilled is derived from the combination of $S^{t-1}$ and $\bar{y}^t$. (orange) For the semantic separation of the new class from the background at step $t$, background probability of old model is distilled into the new class probability of the new model. To further support the separation, an orthogonality loss between the new class and the background is implemented.
  • Figure 3: Comparison of qualitative results on the 15-1 protocol of the Pascal VOC between the baseline, MiB, and ours.
  • Figure 4: Comparisons between the Pseudo-Labeling map (PL map) and the Selective Pseudo-Labeling map (SPL map). Both of them are yielded from our method.
  • Figure A1: Visualization of Pseudo-Labeling (PL) map and Selective Pseudo-Labeling (SPL) map. By comparing these two, we highlight the primary objective of selective strategies in pseudo-labeling: to effectively exclude pixels that might be mislabeled from the training process.
  • ...and 3 more figures