Table of Contents
Fetching ...

SAILS: Segment Anything with Incrementally Learned Semantics for Task-Invariant and Training-Free Continual Learning

Shishir Muralidhara, Didier Stricker, René Schuster

TL;DR

SAILS tackles class-incremental semantic segmentation without retraining by decoupling spatial segmentation and semantic labeling: SAM provides zero-shot region extraction, while a frozen backbone learns class prototypes in a fixed feature space. It strengthens this with selective intra-class clustering to capture intra-class variability and maintains a fully training-free pipeline, thereby avoiding forgetting and enabling positive backward transfer. Empirically, SAILS outperforms several training-based baselines on PASCAL VOC and Cityscapes, especially in long task sequences, and demonstrates robust task-invariant performance. The work highlights the potential of combining foundation-model segmentation with prototype-based semantics for efficient continual learning and points to future improvements via contrastive refinements on frozen representations.

Abstract

Continual learning remains constrained by the need for repeated retraining, high computational costs, and the persistent challenge of forgetting. These factors significantly limit the applicability of continuous learning in real-world settings, as iterative model updates require significant computational resources and inherently exacerbate forgetting. We present SAILS -- Segment Anything with Incrementally Learned Semantics, a training-free framework for Class-Incremental Semantic Segmentation (CISS) that sidesteps these challenges entirely. SAILS leverages foundational models to decouple CISS into two stages: Zero-shot region extraction using Segment Anything Model (SAM), followed by semantic association through prototypes in a fixed feature space. SAILS incorporates selective intra-class clustering, resulting in multiple prototypes per class to better model intra-class variability. Our results demonstrate that, despite requiring no incremental training, SAILS typically surpasses the performance of existing training-based approaches on standard CISS datasets, particularly in long and challenging task sequences where forgetting tends to be most severe. By avoiding parameter updates, SAILS completely eliminates forgetting and maintains consistent, task-invariant performance. Furthermore, SAILS exhibits positive backward transfer, where the introduction of new classes can enhance performance on previous classes.

SAILS: Segment Anything with Incrementally Learned Semantics for Task-Invariant and Training-Free Continual Learning

TL;DR

SAILS tackles class-incremental semantic segmentation without retraining by decoupling spatial segmentation and semantic labeling: SAM provides zero-shot region extraction, while a frozen backbone learns class prototypes in a fixed feature space. It strengthens this with selective intra-class clustering to capture intra-class variability and maintains a fully training-free pipeline, thereby avoiding forgetting and enabling positive backward transfer. Empirically, SAILS outperforms several training-based baselines on PASCAL VOC and Cityscapes, especially in long task sequences, and demonstrates robust task-invariant performance. The work highlights the potential of combining foundation-model segmentation with prototype-based semantics for efficient continual learning and points to future improvements via contrastive refinements on frozen representations.

Abstract

Continual learning remains constrained by the need for repeated retraining, high computational costs, and the persistent challenge of forgetting. These factors significantly limit the applicability of continuous learning in real-world settings, as iterative model updates require significant computational resources and inherently exacerbate forgetting. We present SAILS -- Segment Anything with Incrementally Learned Semantics, a training-free framework for Class-Incremental Semantic Segmentation (CISS) that sidesteps these challenges entirely. SAILS leverages foundational models to decouple CISS into two stages: Zero-shot region extraction using Segment Anything Model (SAM), followed by semantic association through prototypes in a fixed feature space. SAILS incorporates selective intra-class clustering, resulting in multiple prototypes per class to better model intra-class variability. Our results demonstrate that, despite requiring no incremental training, SAILS typically surpasses the performance of existing training-based approaches on standard CISS datasets, particularly in long and challenging task sequences where forgetting tends to be most severe. By avoiding parameter updates, SAILS completely eliminates forgetting and maintains consistent, task-invariant performance. Furthermore, SAILS exhibits positive backward transfer, where the introduction of new classes can enhance performance on previous classes.
Paper Structure (22 sections, 4 equations, 3 figures, 5 tables)

This paper contains 22 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The input image is first segmented using SAM, and the resulting masks are iteratively refined to produce an aggregated mask with distinct, non-overlapping regions, which are then used for extraction of region proposals from the image.
  • Figure 2: Overview of the incremental semantic learning process. Regions of interest (RoIs) corresponding to classes introduced in the current task are embedded using a frozen pretrained backbone. For each class, either a single prototype or multiple sub-prototypes are computed based on the intra-class variability.
  • Figure 3: CLIP misclassifications of region segments due to shortcut learning and ambiguous regions.