ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

Beomyoung Kim; Joonsang Yu; Sung Ju Hwang

ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

Beomyoung Kim, Joonsang Yu, Sung Ju Hwang

TL;DR

ECLIPSE addresses continual panoptic segmentation by freezing the base model and learning only a compact set of per-step visual prompt embeddings, enabling efficient incorporation of new classes while preserving prior knowledge. It introduces logit manipulation to mitigate semantic drift and error propagation, and employs deep prompt tuning to sustain plasticity. On ADE20K, ECLIPSE achieves state-of-the-art results with a fraction of trainable parameters and reduced computational demand, demonstrating strong robustness to forgetting across multiple continual steps. This approach significantly lowers training complexity and enhances scalability for real-world continual segmentation tasks, with potential for further gains using stronger frozen backbones and pretraining.

Abstract

Panoptic segmentation, combining semantic and instance segmentation, stands as a cutting-edge computer vision task. Despite recent progress with deep learning models, the dynamic nature of real-world applications necessitates continual learning, where models adapt to new classes (plasticity) over time without forgetting old ones (catastrophic forgetting). Current continual segmentation methods often rely on distillation strategies like knowledge distillation and pseudo-labeling, which are effective but result in increased training complexity and computational overhead. In this paper, we introduce a novel and efficient method for continual panoptic segmentation based on Visual Prompt Tuning, dubbed ECLIPSE. Our approach involves freezing the base model parameters and fine-tuning only a small set of prompt embeddings, addressing both catastrophic forgetting and plasticity and significantly reducing the trainable parameters. To mitigate inherent challenges such as error propagation and semantic drift in continual segmentation, we propose logit manipulation to effectively leverage common knowledge across the classes. Experiments on ADE20K continual panoptic segmentation benchmark demonstrate the superiority of ECLIPSE, notably its robustness against catastrophic forgetting and its reasonable plasticity, achieving a new state-of-the-art. The code is available at https://github.com/clovaai/ECLIPSE.

ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

TL;DR

Abstract

Paper Structure (26 sections, 6 equations, 5 figures, 5 tables)

This paper contains 26 sections, 6 equations, 5 figures, 5 tables.

Introduction
Related Work
Panoptic Segmentation
Continual Segmentation.
Visual Prompt Tuning (VPT) in Continual Learning.
Preliminary
Problem Setting
Network Architecture: Mask2Former
Method
Prompt Tuning for Continual Segmentation
Resolving Semantic Confusion and Drift
Experiments
Experimental Setting.
Dataset and Evaluation Metrics.
Incremental Protocol.
...and 11 more sections

Figures (5)

Figure 1: Comparison of the overview of (a) previous methods and (b) our method. Previous methods rely on distillation strategies such as knowledge distillation and pseudo-labeling, demanding more training complexity and computational overhead. In contrast, our method freezes all trained parameters and fine-tunes only a small set of prompt embeddings, robustly keeping the previous knowledge and extending the scalability of the model.
Figure 2: Overview of ECLIPSE. We freeze all trained parameters and fine-tune only a set of prompt embeddings $\mathbf{Q}^{t}$ alongside MLP layers to recognize a set of classes $\mathcal{C}^{t}$. In inference, we aggregate outputs from all prompt sets $\mathbf{Q}^{1:t}$ to segment all learned classes $\mathcal{C}^{1:t}$.
Figure 3: Illustration of logit manipulation. To alleviate semantic drift of no-obj class, we make a new no-obj logit leveraging the inter-class knowledge of all learned classes. Moreover, an erroneous prediction caused by semantic confusion of prior frozen parameters can be fixed through logit manipulation.
Figure 4: Qualitative samples for logit manipulation. At step 1, the model, which learned classes $\mathcal{C}^{1}$ containing water and car, can produce incorrect predictions due to semantic confusion with unexplored classes; these errors propagate forward continuously, resulting in overlapping predictions for one object (3rd column). After the model learns new classes containing lake and van at step 6, the logit manipulation can suppress the prior errors.
Figure 5: Qualitative comparisons between ECLIPSE and CoMFormer (comformer)cermelli2023comformer on the ADE20K 100-10 continual panoptic segmentation scenario. Our ECLIPSE shows more robust results against catastrophic forgetting without reliance on distillation strategies.

ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

TL;DR

Abstract

ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)