Table of Contents
Fetching ...

Learning at a Glance: Towards Interpretable Data-limited Continual Semantic Segmentation via Semantic-Invariance Modelling

Bo Yuan, Danpei Zhao, Zhenwei Shi

TL;DR

This work tackles semantic segmentation under continual learning with limited incremental data by introducing Learning at a Glance (LAG), a semantically principled, interpretable framework. LAG decomposes representations into semantic-invariant and sample-specific components via channel-wise decoupling and neuron-relevant semantic consistency, and couples this with a disentangled distillation regime (SPM and SFP) plus an explicit unknown class and uncertainty-aware pseudo-labelling. The approach achieves competitive CSS performance across VOC, ADE20K, and ISPRS, demonstrating strong anti-forgetting and adaptability under data-limited conditions, while providing interpretability via neuron-level relevance analysis. This work advances practical CSS by combining memory-free learning, interpretable constraints, and data-efficient protocols with broad applicability to real-world scenarios.

Abstract

Continual semantic segmentation (CSS) based on incremental learning (IL) is a great endeavour in developing human-like segmentation models. However, current CSS approaches encounter challenges in the trade-off between preserving old knowledge and learning new ones, where they still need large-scale annotated data for incremental training and lack interpretability. In this paper, we present Learning at a Glance (LAG), an efficient, robust, human-like and interpretable approach for CSS. Specifically, LAG is a simple and model-agnostic architecture, yet it achieves competitive CSS efficiency with limited incremental data. Inspired by human-like recognition patterns, we propose a semantic-invariance modelling approach via semantic features decoupling that simultaneously reconciles solid knowledge inheritance and new-term learning. Concretely, the proposed decoupling manner includes two ways, i.e., channel-wise decoupling and spatial-level neuron-relevant semantic consistency. Our approach preserves semantic-invariant knowledge as solid prototypes to alleviate catastrophic forgetting, while also constraining sample-specific contents through an asymmetric contrastive learning method to enhance model robustness during IL steps. Experimental results in multiple datasets validate the effectiveness of the proposed method. Furthermore, we introduce a novel CSS protocol that better reflects realistic data-limited CSS settings, and LAG achieves superior performance under multiple data-limited conditions.

Learning at a Glance: Towards Interpretable Data-limited Continual Semantic Segmentation via Semantic-Invariance Modelling

TL;DR

This work tackles semantic segmentation under continual learning with limited incremental data by introducing Learning at a Glance (LAG), a semantically principled, interpretable framework. LAG decomposes representations into semantic-invariant and sample-specific components via channel-wise decoupling and neuron-relevant semantic consistency, and couples this with a disentangled distillation regime (SPM and SFP) plus an explicit unknown class and uncertainty-aware pseudo-labelling. The approach achieves competitive CSS performance across VOC, ADE20K, and ISPRS, demonstrating strong anti-forgetting and adaptability under data-limited conditions, while providing interpretability via neuron-level relevance analysis. This work advances practical CSS by combining memory-free learning, interpretable constraints, and data-efficient protocols with broad applicability to real-world scenarios.

Abstract

Continual semantic segmentation (CSS) based on incremental learning (IL) is a great endeavour in developing human-like segmentation models. However, current CSS approaches encounter challenges in the trade-off between preserving old knowledge and learning new ones, where they still need large-scale annotated data for incremental training and lack interpretability. In this paper, we present Learning at a Glance (LAG), an efficient, robust, human-like and interpretable approach for CSS. Specifically, LAG is a simple and model-agnostic architecture, yet it achieves competitive CSS efficiency with limited incremental data. Inspired by human-like recognition patterns, we propose a semantic-invariance modelling approach via semantic features decoupling that simultaneously reconciles solid knowledge inheritance and new-term learning. Concretely, the proposed decoupling manner includes two ways, i.e., channel-wise decoupling and spatial-level neuron-relevant semantic consistency. Our approach preserves semantic-invariant knowledge as solid prototypes to alleviate catastrophic forgetting, while also constraining sample-specific contents through an asymmetric contrastive learning method to enhance model robustness during IL steps. Experimental results in multiple datasets validate the effectiveness of the proposed method. Furthermore, we introduce a novel CSS protocol that better reflects realistic data-limited CSS settings, and LAG achieves superior performance under multiple data-limited conditions.
Paper Structure (30 sections, 20 equations, 13 figures, 12 tables, 1 algorithm)

This paper contains 30 sections, 20 equations, 13 figures, 12 tables, 1 algorithm.

Figures (13)

  • Figure 1: Schematic illustration of the proposed approach. (a) $\textbf{F}_\theta$ is a classifier pretrained on realistic images but without abstract images above. We argue that the model has the potential to make correct classification with key features, which is the inspiration of this paper. (b) The proposed disentangled distillation scheme. Through the disentangled mechanism and the collaboration of corresponding constraints including SPM and SFP, our model achieves robust CSS performance by alleviating catastrophic forgetting and semantic drift.
  • Figure 2: The pipeline of the proposed method at t step. An old class (motorbike) and a new class (person) are used as an example. At each incremental step, the old model is frozen and no old data can be accessed. While the current model is trained only on the incremental data. The neuron-relevant semantic-consistency constraint (NSC) is performed between $M^{t-1}$ and $M^t$. The semantic features are disentangled to semantic-invariant term and sample-specific term in the latent space. Prototype alignment (SPM) and Feature preserving (SFP) modules are designed respectively but trained jointly for disentangled knowledge transfer.
  • Figure 3: The neuron-relevant semantic-consistency constraint at $t$ step. The feature relevance constraint is the neuron relevance between $M^{t-1}$ and $M^t$. The pixel relevance illustrates the semantic consistency between neurons.
  • Figure 4: Uncertainty-aware unknown class modelling.
  • Figure 5: The mIoU (%) evolution against number of learned classes on VOC 15-1 task.
  • ...and 8 more figures