Continual Segmentation with Disentangled Objectness Learning and Class Recognition

Yizheng Gong; Siyue Yu; Xiaoyang Wang; Jimin Xiao

Continual Segmentation with Disentangled Objectness Learning and Class Recognition

Yizheng Gong, Siyue Yu, Xiaoyang Wang, Jimin Xiao

TL;DR

Continual segmentation remains challenging due to catastrophic forgetting, especially under overlapped task settings. The authors propose CoMasTRe, a two‑stage Transformer‑based framework that decouples objectness learning from class recognition, leveraging the transferability and forgetting resistance of objectness in query‑based segmentation. Objectness distillation and a dual path for class distillation preserve knowledge across tasks while task‑specific classifiers reduce interference, enabling stable lifelong learning. Empirical results on PASCAL VOC 2012 and ADE20K show state‑of‑the‑art performance with substantial gains on new classes and robust retention of old ones, highlighting the effectiveness of mask classification for continual segmentation.

Abstract

Most continual segmentation methods tackle the problem as a per-pixel classification task. However, such a paradigm is very challenging, and we find query-based segmenters with built-in objectness have inherent advantages compared with per-pixel ones, as objectness has strong transfer ability and forgetting resistance. Based on these findings, we propose CoMasTRe by disentangling continual segmentation into two stages: forgetting-resistant continual objectness learning and well-researched continual classification. CoMasTRe uses a two-stage segmenter learning class-agnostic mask proposals at the first stage and leaving recognition to the second stage. During continual learning, a simple but effective distillation is adopted to strengthen objectness. To further mitigate the forgetting of old classes, we design a multi-label class distillation strategy suited for segmentation. We assess the effectiveness of CoMasTRe on PASCAL VOC and ADE20K. Extensive experiments show that our method outperforms per-pixel and query-based methods on both datasets. Code will be available at https://github.com/jordangong/CoMasTRe.

Continual Segmentation with Disentangled Objectness Learning and Class Recognition

TL;DR

Abstract

Paper Structure (18 sections, 5 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 18 sections, 5 equations, 6 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Method
Problem Definition
CoMasTRe Architecture
Stage 1: Objectness Learning
Stage 2: Class Recognition
Learning without Forgetting with CoMasTRe
Objectness Distillation
Class Distillation
Experiments
Setup
Quantitative Evaluation
Ablation Studies
Conclusion
...and 3 more sections

Figures (6)

Figure 1: Hidden properties inside query-based segmenters. Objectness in query-based methods helps generalize mask proposals on unseen classes similar to learned classes (top). Additionally, because of the transfer ability of objectness, query-based methods are resistant to catastrophic forgetting of mask proposals (bottom).
Figure 2: CoMasTRe Architecture.$\bigotimes$ denotes the dot product between positional embeddings $\mathcal{E}_\mathrm{pos}$ and pixel embeddings $\mathcal{E}^4_\mathrm{pixel}$. CoMasTRe uses a two-stage image segmenter including three components: (a) a backbone and a pixel decoder producing pixel embedding, (b) a mask decoder $f$ with learnable positional queries $\mathcal{Q}_\mathrm{pos}$ for objectness learning, and (c) a class decoder $g$ with a set of task queries $\mathcal{Q}_\mathrm{task}$ for class recognition.
Figure 3: Learning without forgetting with CoMasTRe. To tackle catastrophic forgetting, CoMasTRe separated the distillation process into two stages, including objectness distillation and class distillation. We perform bipartite matching first, as in (a). Then, we distill the knowledge of objectness for remaining embeddings, as in (b). Finally, we select positional embeddings for the class distillation if they match with ground truth or have high objectness scores at the last step. The class knowledge is distilled from both matched and unmatched positional embeddings with a class decoder and task-specific classifiers, as in (c).
Figure A1: Class decoder architecture.
Figure A2: Qualitative results compared with CoMFormer cermelliCoMFormerContinualLearning2023 in PASCAL VOC 15-1 setting.
...and 1 more figures

Continual Segmentation with Disentangled Objectness Learning and Class Recognition

TL;DR

Abstract

Continual Segmentation with Disentangled Objectness Learning and Class Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (6)