Table of Contents
Fetching ...

CIT: Rethinking Class-incremental Semantic Segmentation with a Class Independent Transformation

Jinchao Ge, Bowen Zhang, Akide Liu, Minh Hieu Phan, Qi Chen, Yangyang Shu, Yang Zhao

TL;DR

A simple, yet effective Class Independent Transformation (CIT) is introduced that converts the outputs of existing semantic segmentation models into class-independent forms with negligible cost or performance loss and establishes an accumulative distillation framework, ensuring equitable incorporation of all class information.

Abstract

Class-incremental semantic segmentation (CSS) requires that a model learn to segment new classes without forgetting how to segment previous ones: this is typically achieved by distilling the current knowledge and incorporating the latest data. However, bypassing iterative distillation by directly transferring outputs of initial classes to the current learning task is not supported in existing class-specific CSS methods. Via Softmax, they enforce dependency between classes and adjust the output distribution at each learning step, resulting in a large probability distribution gap between initial and current tasks. We introduce a simple, yet effective Class Independent Transformation (CIT) that converts the outputs of existing semantic segmentation models into class-independent forms with negligible cost or performance loss. By utilizing class-independent predictions facilitated by CIT, we establish an accumulative distillation framework, ensuring equitable incorporation of all class information. We conduct extensive experiments on various segmentation architectures, including DeepLabV3, Mask2Former, and SegViTv2. Results from these experiments show minimal task forgetting across different datasets, with less than 5% for ADE20K in the most challenging 11 task configurations and less than 1% across all configurations for the PASCAL VOC 2012 dataset.

CIT: Rethinking Class-incremental Semantic Segmentation with a Class Independent Transformation

TL;DR

A simple, yet effective Class Independent Transformation (CIT) is introduced that converts the outputs of existing semantic segmentation models into class-independent forms with negligible cost or performance loss and establishes an accumulative distillation framework, ensuring equitable incorporation of all class information.

Abstract

Class-incremental semantic segmentation (CSS) requires that a model learn to segment new classes without forgetting how to segment previous ones: this is typically achieved by distilling the current knowledge and incorporating the latest data. However, bypassing iterative distillation by directly transferring outputs of initial classes to the current learning task is not supported in existing class-specific CSS methods. Via Softmax, they enforce dependency between classes and adjust the output distribution at each learning step, resulting in a large probability distribution gap between initial and current tasks. We introduce a simple, yet effective Class Independent Transformation (CIT) that converts the outputs of existing semantic segmentation models into class-independent forms with negligible cost or performance loss. By utilizing class-independent predictions facilitated by CIT, we establish an accumulative distillation framework, ensuring equitable incorporation of all class information. We conduct extensive experiments on various segmentation architectures, including DeepLabV3, Mask2Former, and SegViTv2. Results from these experiments show minimal task forgetting across different datasets, with less than 5% for ADE20K in the most challenging 11 task configurations and less than 1% across all configurations for the PASCAL VOC 2012 dataset.

Paper Structure

This paper contains 18 sections, 2 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The top panel illustrates the current iterative distillation pipeline, where knowledge is sequentially transferred and distilled across tasks. This sequential approach can cause an accumulation of errors over time. In contrast, the lower panel illustrates the use of independent logits, enabling the implementation of our proposed accumulative pipeline. Direct distillation from the source model for each class aids in preserving the integrity of acquired information during the learning sequence. Each square represents the addition of a new task at each stage.
  • Figure 2: CIT Overview. At each task $t$, it trains its decoder to incorporate the latest class label, enabling it to produce segmentation predictions for the newly introduced classes. Initially, the teacher model $\mathcal{T}^{t-1}$ generates a pseudo-label for the existing classes up to t-1. Subsequently, the model undergoes training utilizing two key components: a feature knowledge distillation loss $\mathcal{L}_{distill}$ ($\mathcal{L}_d$) and a supervised $\mathcal{L}_{supervised}$ ($\mathcal{L}_s$).
  • Figure 3: This figure compares the extent of forgetting by evaluating base tasks in terms of mIoU vs the training step count in two challenging protocols: 100-10 and 100-5. (a) and (d) utilize DeepLabV3, offering a comparative analysis with preceding methodologies. (b) and (e) are based on Mask2Former, juxtaposed against prior techniques. c) and (f) deploy SegViTv2, compared with earlier methods.
  • Figure 4: DeepLabV3 19-1 (5 tasks)
  • Figure 5: SegViTv2 19-1 (5 tasks)