Table of Contents
Fetching ...

Robust3D-CIL: Robust Class-Incremental Learning for 3D Perception

Jinge Ma, Jiangpeng He, Fengqing Zhu

TL;DR

Robust3D-CIL addresses robust class-incremental learning for 3D point clouds under unknown corruption, a realistic setting where full clean data replay is impractical. It couples a farthest exemplar selection strategy with FPS-based point cloud downsampling to improve replay buffer diversity and memory efficiency, enabling more replay exemplars without increasing memory. Empirical results on ModelNet40, OmniObject3D, and Objaverse-LVIS demonstrate consistent OA gains (2%–11%) over replay-based baselines, with gains amplifying as the number of tasks grows and showing backbone-agnostic robustness. The approach provides a practical pathway to robust, continual 3D perception in real-world streaming scenarios, and can integrate with existing CIL methods to boost performance under data corruption.

Abstract

3D perception plays a crucial role in real-world applications such as autonomous driving, robotics, and AR/VR. In practical scenarios, 3D perception models must continuously adapt to new data and emerging object categories, but retraining from scratch incurs prohibitive costs. Therefore, adopting class-incremental learning (CIL) becomes particularly essential. However, real-world 3D point cloud data often include corrupted samples, which poses significant challenges for existing CIL methods and leads to more severe forgetting on corrupted data. To address these challenges, we consider the scenario in which a CIL model can be updated using point clouds with unknown corruption to better simulate real-world conditions. Inspired by Farthest Point Sampling, we propose a novel exemplar selection strategy that effectively preserves intra-class diversity when selecting replay exemplars, mitigating forgetting induced by data corruption. Furthermore, we introduce a point cloud downsampling-based replay method to utilize the limited replay buffer memory more efficiently, thereby further enhancing the model's continual learning ability. Extensive experiments demonstrate that our method improves the performance of replay-based CIL baselines by 2% to 11%, proving its effectiveness and promising potential for real-world 3D applications.

Robust3D-CIL: Robust Class-Incremental Learning for 3D Perception

TL;DR

Robust3D-CIL addresses robust class-incremental learning for 3D point clouds under unknown corruption, a realistic setting where full clean data replay is impractical. It couples a farthest exemplar selection strategy with FPS-based point cloud downsampling to improve replay buffer diversity and memory efficiency, enabling more replay exemplars without increasing memory. Empirical results on ModelNet40, OmniObject3D, and Objaverse-LVIS demonstrate consistent OA gains (2%–11%) over replay-based baselines, with gains amplifying as the number of tasks grows and showing backbone-agnostic robustness. The approach provides a practical pathway to robust, continual 3D perception in real-world streaming scenarios, and can integrate with existing CIL methods to boost performance under data corruption.

Abstract

3D perception plays a crucial role in real-world applications such as autonomous driving, robotics, and AR/VR. In practical scenarios, 3D perception models must continuously adapt to new data and emerging object categories, but retraining from scratch incurs prohibitive costs. Therefore, adopting class-incremental learning (CIL) becomes particularly essential. However, real-world 3D point cloud data often include corrupted samples, which poses significant challenges for existing CIL methods and leads to more severe forgetting on corrupted data. To address these challenges, we consider the scenario in which a CIL model can be updated using point clouds with unknown corruption to better simulate real-world conditions. Inspired by Farthest Point Sampling, we propose a novel exemplar selection strategy that effectively preserves intra-class diversity when selecting replay exemplars, mitigating forgetting induced by data corruption. Furthermore, we introduce a point cloud downsampling-based replay method to utilize the limited replay buffer memory more efficiently, thereby further enhancing the model's continual learning ability. Extensive experiments demonstrate that our method improves the performance of replay-based CIL baselines by 2% to 11%, proving its effectiveness and promising potential for real-world 3D applications.

Paper Structure

This paper contains 35 sections, 4 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: CIL performance of PointNeXt qian2022pointnext with partially corrupted ModelNet40 wu20153d. The orange solid line (replay) in the figure is a weighted average of the two orange dashed lines (replay-clean and replay-corrupt). For details on the experimental setup, please refer to Sec. \ref{['scenario']}
  • Figure 2: (a) Visualizations of clean point clouds and several common corrupted point cloud types are shown, specifically: I clean, II rotate, III scale, IV jitter, V add global, VI add local, VII dropout global, and VIII dropout local.(b) The training setup of Robust3D-CIL is displayed, where each class $C_1$ of the training point clouds includes a proportion, $\rho_k$, of clean point clouds ($\rho_k \in [0.5, 0.95]$), with the remaining being randomly corrupted point clouds of various types. The corruption of the training point clouds and $\rho_k$ is invisible to the model, simulating the real-world scenario where training point clouds may contain unknown corruptions. (c) The testing setup of Robust3D-CIL is illustrated, where each class includes both clean point clouds and an equal proportion of each type of corrupted point cloud, to better assess the model's robustness across various test data conditions.
  • Figure 3: An overview of our method: after completing the learning of task $t$, we use the current model to extract point cloud features and then apply farthest exemplar selection to choose replay exemplars that are both representative and show intra-class diversity. Then we downsample a proportion $\alpha$ of the exemplars with downsampling rate $r$ to reduce the required storage, to enable more replay exemplars to fit within the limited replay buffer memory. For most baseline models, in addition to the classification loss, other losses such as knowledge distillation loss are also included.
  • Figure 4: This is a 2D visualization of the feature distribution for a particular class. (a) shows herding selection, in which at each step the sample closest to the center of the remaining unselected samples is chosen as the exemplar, resulting in a predominance of clean samples. (b) shows our proposed farthest selection, where each selected exemplar suppresses the likelihood of nearby samples being chosen, driving the exemplars to better cover the entire distribution and capture the intra-class variance.
  • Figure 5: These plots show the performance of the baseline models and our method on ModelNet40-Mix. The left plot has 10 tasks, and the right plot has 4 tasks. Our method significantly improves the robustness of the baseline models.
  • ...and 2 more figures