CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point Clouds
Jiaxu Li, Rui Li, Jianyu Qi, Songning Lai, Linpu Lv, Kejia Fan, Jianheng Tang, Yutao Yue, Dongzhan Zhou, Yuanhuai Liu, Huiping Zhuang
TL;DR
CFSSeg introduces a gradient-free, closed-form solution for class-incremental semantic segmentation across 2D images and 3D point clouds. The method freezes a pretrained encoder, uses a high-dimensional random feature expansion (RHL) to boost plasticity, and updates the classifier head with a recursive closed-form ridge regression (C-RLS) that cumulatively incorporates past information via a memory matrix $\mathbf{\Psi}_t$ without storing samples. Pseudo-labeling based on uncertainty mitigates semantic drift in 2D, while BALD-guided uncertainty with KNN context handles drift in 3D, enabling exemplar-free continual learning. Across Pascal VOC2012, S3DIS, and ScanNet, CFSSeg achieves state-of-the-art performance with significantly reduced training time (single-pass per step) and improved data privacy, making it practical for real-time and privacy-constrained deployments.
Abstract
2D images and 3D point clouds are foundational data types for multimedia applications, including real-time video analysis, augmented reality (AR), and 3D scene understanding. Class-incremental semantic segmentation (CSS) requires incrementally learning new semantic categories while retaining prior knowledge. Existing methods typically rely on computationally expensive training based on stochastic gradient descent, employing complex regularization or exemplar replay. However, stochastic gradient descent-based approaches inevitably update the model's weights for past knowledge, leading to catastrophic forgetting, a problem exacerbated by pixel/point-level granularity. To address these challenges, we propose CFSSeg, a novel exemplar-free approach that leverages a closed-form solution, offering a practical and theoretically grounded solution for continual semantic segmentation tasks. This eliminates the need for iterative gradient-based optimization and storage of past data, requiring only a single pass through new samples per step. It not only enhances computational efficiency but also provides a practical solution for dynamic, privacy-sensitive multimedia environments. Extensive experiments on 2D and 3D benchmark datasets such as Pascal VOC2012, S3DIS, and ScanNet demonstrate CFSSeg's superior performance.
