Table of Contents
Fetching ...

CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point Clouds

Jiaxu Li, Rui Li, Jianyu Qi, Songning Lai, Linpu Lv, Kejia Fan, Jianheng Tang, Yutao Yue, Dongzhan Zhou, Yuanhuai Liu, Huiping Zhuang

TL;DR

CFSSeg introduces a gradient-free, closed-form solution for class-incremental semantic segmentation across 2D images and 3D point clouds. The method freezes a pretrained encoder, uses a high-dimensional random feature expansion (RHL) to boost plasticity, and updates the classifier head with a recursive closed-form ridge regression (C-RLS) that cumulatively incorporates past information via a memory matrix $\mathbf{\Psi}_t$ without storing samples. Pseudo-labeling based on uncertainty mitigates semantic drift in 2D, while BALD-guided uncertainty with KNN context handles drift in 3D, enabling exemplar-free continual learning. Across Pascal VOC2012, S3DIS, and ScanNet, CFSSeg achieves state-of-the-art performance with significantly reduced training time (single-pass per step) and improved data privacy, making it practical for real-time and privacy-constrained deployments.

Abstract

2D images and 3D point clouds are foundational data types for multimedia applications, including real-time video analysis, augmented reality (AR), and 3D scene understanding. Class-incremental semantic segmentation (CSS) requires incrementally learning new semantic categories while retaining prior knowledge. Existing methods typically rely on computationally expensive training based on stochastic gradient descent, employing complex regularization or exemplar replay. However, stochastic gradient descent-based approaches inevitably update the model's weights for past knowledge, leading to catastrophic forgetting, a problem exacerbated by pixel/point-level granularity. To address these challenges, we propose CFSSeg, a novel exemplar-free approach that leverages a closed-form solution, offering a practical and theoretically grounded solution for continual semantic segmentation tasks. This eliminates the need for iterative gradient-based optimization and storage of past data, requiring only a single pass through new samples per step. It not only enhances computational efficiency but also provides a practical solution for dynamic, privacy-sensitive multimedia environments. Extensive experiments on 2D and 3D benchmark datasets such as Pascal VOC2012, S3DIS, and ScanNet demonstrate CFSSeg's superior performance.

CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point Clouds

TL;DR

CFSSeg introduces a gradient-free, closed-form solution for class-incremental semantic segmentation across 2D images and 3D point clouds. The method freezes a pretrained encoder, uses a high-dimensional random feature expansion (RHL) to boost plasticity, and updates the classifier head with a recursive closed-form ridge regression (C-RLS) that cumulatively incorporates past information via a memory matrix without storing samples. Pseudo-labeling based on uncertainty mitigates semantic drift in 2D, while BALD-guided uncertainty with KNN context handles drift in 3D, enabling exemplar-free continual learning. Across Pascal VOC2012, S3DIS, and ScanNet, CFSSeg achieves state-of-the-art performance with significantly reduced training time (single-pass per step) and improved data privacy, making it practical for real-time and privacy-constrained deployments.

Abstract

2D images and 3D point clouds are foundational data types for multimedia applications, including real-time video analysis, augmented reality (AR), and 3D scene understanding. Class-incremental semantic segmentation (CSS) requires incrementally learning new semantic categories while retaining prior knowledge. Existing methods typically rely on computationally expensive training based on stochastic gradient descent, employing complex regularization or exemplar replay. However, stochastic gradient descent-based approaches inevitably update the model's weights for past knowledge, leading to catastrophic forgetting, a problem exacerbated by pixel/point-level granularity. To address these challenges, we propose CFSSeg, a novel exemplar-free approach that leverages a closed-form solution, offering a practical and theoretically grounded solution for continual semantic segmentation tasks. This eliminates the need for iterative gradient-based optimization and storage of past data, requiring only a single pass through new samples per step. It not only enhances computational efficiency but also provides a practical solution for dynamic, privacy-sensitive multimedia environments. Extensive experiments on 2D and 3D benchmark datasets such as Pascal VOC2012, S3DIS, and ScanNet demonstrate CFSSeg's superior performance.

Paper Structure

This paper contains 19 sections, 1 theorem, 33 equations, 2 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

The $\mathbf{\Phi_t}$ weights, recursively obtained by are equivalent to those obtained from Eqn eq:9 for step $t$. The matrix $\mathbf{\Psi}_t$ can also be recursively updated by

Figures (2)

  • Figure 1: Illustration of the class-incremental semantic segmentation learning process. At each step, the model is incrementally trained on new classes while retaining knowledge of previously learned classes. For example, initially, the model is introduced to the "airplane" class. Subsequently, the model learns additional classes, with each step introducing new categories, progressively learning "person," "car," and "chair," among others, while expanding its knowledge and maintaining understanding of earlier classes.
  • Figure 2: Overview of the proposed method CFSSeg. In step $t$, the model from step $t-1$ is used to generate pseudo labels via Pseudo Labeling, which are then combined with ground truth labels to form mixed labels. The model inherits the classification head $\hat{\mathbf{\Phi}}_{t-1}$ learned in step $t-1$, and combines it with the mixed labels, the extracted features $\mathbf{E}_t$, and $\mathbf{\Psi}_{t-1}$ from step $t-1$. The C-RLS algorithm is then used to update and obtain $\hat{\mathbf{\Phi}}_{t}$ and $\mathbf{\Psi}_{t}$.

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • proof