Table of Contents
Fetching ...

CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation

Vishal Thengane, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Lu Yin, Xiatian Zhu, Salman Khan

TL;DR

The paper tackles the problem of forgetting and imbalance in class-incremental 3D instance segmentation by proposing CLIMB-3D, a modular framework that integrates Exemplar Replay (ER), a Pseudo-Label Generator (PLG), and Class-Balanced Re-weighting (CBR). It formulates CI-3DIS with disjoint class introductions across tasks and uses top-$K$ pseudo-labels from a frozen model plus class-frequency-based re-weighting to maintain past knowledge while learning new classes, all under memory constraints. Three benchmarking scenarios on ScanNet200 are introduced to reflect frequency, semantic similarity, and random grouping of categories, and the method achieves state-of-the-art gains, including up to 16.76% mAP improvements in 3D instance segmentation and approximately 30% mIoU gains in 3D semantic segmentation. The findings demonstrate robust learning across both frequent and rare classes and highlight the practical impact for real-world continual 3D scene understanding, with code available at the authors' GitHub repository.

Abstract

While 3D instance segmentation (3DIS) has advanced significantly, most existing methods assume that all object classes are known in advance and uniformly distributed. However, this assumption is unrealistic in dynamic, real-world environments where new classes emerge gradually and exhibit natural imbalance. Although some approaches address the emergence of new classes, they often overlook class imbalance, which leads to suboptimal performance, particularly on rare categories. To tackle this, we propose \ourmethodbf, a unified framework for \textbf{CL}ass-incremental \textbf{Imb}alance-aware \textbf{3D}IS. Building upon established exemplar replay (ER) strategies, we show that ER alone is insufficient to achieve robust performance under memory constraints. To mitigate this, we introduce a novel pseudo-label generator (PLG) that extends supervision to previously learned categories by leveraging predictions from a frozen model trained on prior tasks. Despite its promise, PLG tends to be biased towards frequent classes. Therefore, we propose a class-balanced re-weighting (CBR) scheme that estimates object frequencies from pseudo-labels and dynamically adjusts training bias, without requiring access to past data. We design and evaluate three incremental scenarios for 3DIS on the challenging ScanNet200 dataset and additionally validate our method for semantic segmentation on ScanNetV2. Our approach achieves state-of-the-art results, surpassing prior work by up to 16.76\% mAP for instance segmentation and approximately 30\% mIoU for semantic segmentation, demonstrating strong generalisation across both frequent and rare classes. Code is available at: https://github.com/vgthengane/CLIMB3D

CLIMB-3D: Continual Learning for Imbalanced 3D Instance Segmentation

TL;DR

The paper tackles the problem of forgetting and imbalance in class-incremental 3D instance segmentation by proposing CLIMB-3D, a modular framework that integrates Exemplar Replay (ER), a Pseudo-Label Generator (PLG), and Class-Balanced Re-weighting (CBR). It formulates CI-3DIS with disjoint class introductions across tasks and uses top- pseudo-labels from a frozen model plus class-frequency-based re-weighting to maintain past knowledge while learning new classes, all under memory constraints. Three benchmarking scenarios on ScanNet200 are introduced to reflect frequency, semantic similarity, and random grouping of categories, and the method achieves state-of-the-art gains, including up to 16.76% mAP improvements in 3D instance segmentation and approximately 30% mIoU gains in 3D semantic segmentation. The findings demonstrate robust learning across both frequent and rare classes and highlight the practical impact for real-world continual 3D scene understanding, with code available at the authors' GitHub repository.

Abstract

While 3D instance segmentation (3DIS) has advanced significantly, most existing methods assume that all object classes are known in advance and uniformly distributed. However, this assumption is unrealistic in dynamic, real-world environments where new classes emerge gradually and exhibit natural imbalance. Although some approaches address the emergence of new classes, they often overlook class imbalance, which leads to suboptimal performance, particularly on rare categories. To tackle this, we propose \ourmethodbf, a unified framework for \textbf{CL}ass-incremental \textbf{Imb}alance-aware \textbf{3D}IS. Building upon established exemplar replay (ER) strategies, we show that ER alone is insufficient to achieve robust performance under memory constraints. To mitigate this, we introduce a novel pseudo-label generator (PLG) that extends supervision to previously learned categories by leveraging predictions from a frozen model trained on prior tasks. Despite its promise, PLG tends to be biased towards frequent classes. Therefore, we propose a class-balanced re-weighting (CBR) scheme that estimates object frequencies from pseudo-labels and dynamically adjusts training bias, without requiring access to past data. We design and evaluate three incremental scenarios for 3DIS on the challenging ScanNet200 dataset and additionally validate our method for semantic segmentation on ScanNetV2. Our approach achieves state-of-the-art results, surpassing prior work by up to 16.76\% mAP for instance segmentation and approximately 30\% mIoU for semantic segmentation, demonstrating strong generalisation across both frequent and rare classes. Code is available at: https://github.com/vgthengane/CLIMB3D

Paper Structure

This paper contains 17 sections, 3 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Overview of the CI-3DIS setting. New object categories are introduced incrementally with each task. After every phase, the model must recognise both newly added and previously learned classes. For instance, in Task 2, categories such as Pillow, Coffee Table, and Sofa Chair are introduced, while the model is expected to retain recognition of earlier classes like Table, Chair, and Couch.
  • Figure 2: Overview of CLIMB-3D for CI-3DIS. The model incrementally learns new classes across sequential phases. During task $t$, point clouds $\mathbf{P}$ and their corresponding labels $\mathbf{Y}^t$ are sampled from a combination of the current training dataset $D^t$ and Exemplar Replay (ER), which maintains a small memory of past examples. These are then passed to the current model $\Phi^t$ to produce predicted labels $\mathbf{Y}^t$. The Pseudo-Label Generator (PLG) selects the top-$K$ predictions from the previous model $\Phi^{t-1}$. These pseudo-labels are then weighted based on class frequency $f(c)$ using Class-Balanced Re-weighting (CBR), and the top-$K$ re-weighted labels are selected to form the balanced pseudo-label set $\bar{\mathbf{Y}}^t$. This pseudo-label set is then concatenated with the ground-truth labels to form a final augmented supervision set $\overline{\mathbf{Y}}^t$ for task $t$, which is used to optimise the model $\Phi^t$ using \ref{['eq:climb3d']}.
  • Figure 3: Tasks are grouped into incremental scenarios based on object frequency, semantic similarity, and random assignment. , , and denote different tasks; shapes indicate object categories; marks the background. Left: Grouped by category frequency. Middle: Grouped by semantic similarity (e.g., similar shapes). Right: Randomly grouped, mixing semantic and frequency variations.
  • Figure 4: Qualitative comparison of ground truth, the baseline method, and our proposed framework on the Split-A evaluation after learning all tasks.
  • Figure 5: Qualitative comparison of ground truth, the baseline method, and our proposed framework on the Split-B evaluation after learning all tasks.
  • ...and 1 more figures