Table of Contents
Fetching ...

Foundation Model-Powered 3D Few-Shot Class Incremental Learning via Training-free Adaptor

Sahar Ahmadi, Ali Cheraghian, Morteza Saberi, Md. Towsif Abir, Hamidreza Dastmalchi, Farookh Hussain, Shafin Rahman

TL;DR

This paper introduces a new method to tackle the Few-Shot Continual Incremental Learning (FSCIL) problem in 3D point cloud environments using a foundational 3D model trained extensively on point cloud data and uses a dual cache system.

Abstract

Recent advances in deep learning for processing point clouds hold increased interest in Few-Shot Class Incremental Learning (FSCIL) for 3D computer vision. This paper introduces a new method to tackle the Few-Shot Continual Incremental Learning (FSCIL) problem in 3D point cloud environments. We leverage a foundational 3D model trained extensively on point cloud data. Drawing from recent improvements in foundation models, known for their ability to work well across different tasks, we propose a novel strategy that does not require additional training to adapt to new tasks. Our approach uses a dual cache system: first, it uses previous test samples based on how confident the model was in its predictions to prevent forgetting, and second, it includes a small number of new task samples to prevent overfitting. This dynamic adaptation ensures strong performance across different learning tasks without needing lots of fine-tuning. We tested our approach on datasets like ModelNet, ShapeNet, ScanObjectNN, and CO3D, showing that it outperforms other FSCIL methods and demonstrating its effectiveness and versatility. The code is available at \url{https://github.com/ahmadisahar/ACCV_FCIL3D}.

Foundation Model-Powered 3D Few-Shot Class Incremental Learning via Training-free Adaptor

TL;DR

This paper introduces a new method to tackle the Few-Shot Continual Incremental Learning (FSCIL) problem in 3D point cloud environments using a foundational 3D model trained extensively on point cloud data and uses a dual cache system.

Abstract

Recent advances in deep learning for processing point clouds hold increased interest in Few-Shot Class Incremental Learning (FSCIL) for 3D computer vision. This paper introduces a new method to tackle the Few-Shot Continual Incremental Learning (FSCIL) problem in 3D point cloud environments. We leverage a foundational 3D model trained extensively on point cloud data. Drawing from recent improvements in foundation models, known for their ability to work well across different tasks, we propose a novel strategy that does not require additional training to adapt to new tasks. Our approach uses a dual cache system: first, it uses previous test samples based on how confident the model was in its predictions to prevent forgetting, and second, it includes a small number of new task samples to prevent overfitting. This dynamic adaptation ensures strong performance across different learning tasks without needing lots of fine-tuning. We tested our approach on datasets like ModelNet, ShapeNet, ScanObjectNN, and CO3D, showing that it outperforms other FSCIL methods and demonstrating its effectiveness and versatility. The code is available at \url{https://github.com/ahmadisahar/ACCV_FCIL3D}.

Paper Structure

This paper contains 16 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: (a) Existing methods chen2021incrementalchowdhury2022fewcheraghian2021semantic for FSCIL typically employ a traditional vision model trained from scratch on the base task, followed by a classifier. Adding novel classes requires fine-tuning with a few novel training samples, often overfitting the novel classes and forgetting the base classes. (b) In contrast, our proposed FSCIL strategy leverages a foundation model pre-trained on a large dataset, which offers strong generalization with minimal effort compared to traditional vision models. Specifically, to incorporate novel classes into the base classes, we introduce a novel strategy that eliminates the need for fine-tuning, thereby reducing both forgetting and overfitting issues. Instead, we use a novel training-free adaptation module to seamlessly integrate novel classes with existing base classes with minimal effort.
  • Figure 2: Feature $\textbf{v}^{t}_{i} \in \mathbb{R}^{m}$ is extracted from input $\mathcal{X}_{i}^{t}$ using vision encoder $V_{e}$. Prompts $\{\textbf{p}_{1}, \textbf{p}_{2}, \cdots, \textbf{p}_{C}\}$ are processed via text encoder $T_{e}$ to obtain features $\{\textbf{e}_{1}, \textbf{e}_{2}, \cdots, \textbf{e}_{C}\}$. These features are concatenated and aligned by module $A$, producing similarity vector $\textbf{a}^{t}_{i}$. This vector is refined by an adaptor module with base task cache $B$ and novel task cache $N$, resulting in the final score $\textbf{z}^{t}_{i}$.
  • Figure 3: (a) Base task cache: This cache stores test samples from the base task to address forgetting issues, selecting samples based on their entropy values. The cache updates when a new test sample has a lower entropy than those currently stored. (b) Novel task cache: This cache contains training samples from few-shot novel classes.
  • Figure 4: Comparison of the harmonic accuracy with SOTA methods on ShapeNet to ScanObjectNN and ModelNet40 to ScanObjectNN datasets.
  • Figure 5: The influence of caches(a) and the impact use relation module after encoders vs zeroshot(b).
  • ...and 1 more figures