Table of Contents
Fetching ...

Decoupling Continual Semantic Segmentation

Yifu Guo, Yuquan Lu, Wentao Zhang, Zishan Xu, Dexia Chen, Siyu Zhang, Yizhe Zhang, Ruixuan Wang

TL;DR

This paper tackles continual semantic segmentation by decoupling class-aware existence detection from class-agnostic segmentation. It introduces DecoupleCSS, a two-stage framework that uses language-guided detection with LoRA adapters to generate location-aware prompts, feeding SAM with class-specific prompts to obtain segmentation masks, while keeping the segmentation module frozen to promote knowledge sharing. The approach yields state-of-the-art results on PASCAL VOC 2012 and ADE20K across multiple CSS settings, with robust ablations showing gains from LoRA, per-class prompt generation, and semantic alignment. It demonstrates practical potential for leveraging vision-language foundation models in CSS with manageable memory overhead and predictable inference time, marking a strong step toward scalable continual learning for dense prediction tasks.

Abstract

Continual Semantic Segmentation (CSS) requires learning new classes without forgetting previously acquired knowledge, addressing the fundamental challenge of catastrophic forgetting in dense prediction tasks. However, existing CSS methods typically employ single-stage encoder-decoder architectures where segmentation masks and class labels are tightly coupled, leading to interference between old and new class learning and suboptimal retention-plasticity balance. We introduce DecoupleCSS, a novel two-stage framework for CSS. By decoupling class-aware detection from class-agnostic segmentation, DecoupleCSS enables more effective continual learning, preserving past knowledge while learning new classes. The first stage leverages pre-trained text and image encoders, adapted using LoRA, to encode class-specific information and generate location-aware prompts. In the second stage, the Segment Anything Model (SAM) is employed to produce precise segmentation masks, ensuring that segmentation knowledge is shared across both new and previous classes. This approach improves the balance between retention and adaptability in CSS, achieving state-of-the-art performance across a variety of challenging tasks. Our code is publicly available at: https://github.com/euyis1019/Decoupling-Continual-Semantic-Segmentation.

Decoupling Continual Semantic Segmentation

TL;DR

This paper tackles continual semantic segmentation by decoupling class-aware existence detection from class-agnostic segmentation. It introduces DecoupleCSS, a two-stage framework that uses language-guided detection with LoRA adapters to generate location-aware prompts, feeding SAM with class-specific prompts to obtain segmentation masks, while keeping the segmentation module frozen to promote knowledge sharing. The approach yields state-of-the-art results on PASCAL VOC 2012 and ADE20K across multiple CSS settings, with robust ablations showing gains from LoRA, per-class prompt generation, and semantic alignment. It demonstrates practical potential for leveraging vision-language foundation models in CSS with manageable memory overhead and predictable inference time, marking a strong step toward scalable continual learning for dense prediction tasks.

Abstract

Continual Semantic Segmentation (CSS) requires learning new classes without forgetting previously acquired knowledge, addressing the fundamental challenge of catastrophic forgetting in dense prediction tasks. However, existing CSS methods typically employ single-stage encoder-decoder architectures where segmentation masks and class labels are tightly coupled, leading to interference between old and new class learning and suboptimal retention-plasticity balance. We introduce DecoupleCSS, a novel two-stage framework for CSS. By decoupling class-aware detection from class-agnostic segmentation, DecoupleCSS enables more effective continual learning, preserving past knowledge while learning new classes. The first stage leverages pre-trained text and image encoders, adapted using LoRA, to encode class-specific information and generate location-aware prompts. In the second stage, the Segment Anything Model (SAM) is employed to produce precise segmentation masks, ensuring that segmentation knowledge is shared across both new and previous classes. This approach improves the balance between retention and adaptability in CSS, achieving state-of-the-art performance across a variety of challenging tasks. Our code is publicly available at: https://github.com/euyis1019/Decoupling-Continual-Semantic-Segmentation.

Paper Structure

This paper contains 33 sections, 7 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overview of the proposed method. (a) The overall architecture. (b) Representative results on challenging settings (2-2 and 4-2) for CSS on Pascal VOC 2012.
  • Figure 2: Workflow of the LTCD and SPG module.
  • Figure 3: The visualization comparison from the last task on the Pascal VOC2012 10-1 setting.
  • Figure 4: The visualization comparison from the last task on the Pascal VOC 2012 15-1 setting. We demonstrate our ability to resist forgetting and learn new categories
  • Figure 5: Sensitivity study on PASCAL VOC2012.
  • ...and 5 more figures