Table of Contents
Fetching ...

iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning

Tom Fischer, Yaoyao Liu, Artur Jesslen, Noor Ahmed, Prakhar Kaushik, Angtian Wang, Alan Yuille, Adam Kortylewski, Eddy Ilg

TL;DR

Extending continual learning to robust out-of-distribution scenarios, this paper introduces iNeMo, Incremental Neural Mesh Models, which grow a library of 3D cuboid meshes and use a memory of the previous backbone plus a replay buffer. The method employs latent-space initialization via Equiangular Tight Frame partitioning and a positional regularization to keep class features in fixed regions, along with continual training losses and knowledge distillation to prevent forgetting. Empirically, iNeMo outperforms strong 2D baselines by 2–6% in-domain and 6–50% in OOD on Pascal3D+ and ObjectNet3D, and achieves the first incremental pose estimation results. The work demonstrates the practical value of 3D object-centric representations for robust class-incremental learning and paves the way for joint 3D perception under evolving class inventories.

Abstract

Different from human nature, it is still common practice today for vision tasks to train deep learning models only initially and on fixed datasets. A variety of approaches have recently addressed handling continual data streams. However, extending these methods to manage out-of-distribution (OOD) scenarios has not effectively been investigated. On the other hand, it has recently been shown that non-continual neural mesh models exhibit strong performance in generalizing to such OOD scenarios. To leverage this decisive property in a continual learning setting, we propose incremental neural mesh models that can be extended with new meshes over time. In addition, we present a latent space initialization strategy that enables us to allocate feature space for future unseen classes in advance and a positional regularization term that forces the features of the different classes to consistently stay in respective latent space regions. We demonstrate the effectiveness of our method through extensive experiments on the Pascal3D and ObjectNet3D datasets and show that our approach outperforms the baselines for classification by $2-6\%$ in the in-domain and by $6-50\%$ in the OOD setting. Our work also presents the first incremental learning approach for pose estimation. Our code and model can be found at https://github.com/Fischer-Tom/iNeMo.

iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning

TL;DR

Extending continual learning to robust out-of-distribution scenarios, this paper introduces iNeMo, Incremental Neural Mesh Models, which grow a library of 3D cuboid meshes and use a memory of the previous backbone plus a replay buffer. The method employs latent-space initialization via Equiangular Tight Frame partitioning and a positional regularization to keep class features in fixed regions, along with continual training losses and knowledge distillation to prevent forgetting. Empirically, iNeMo outperforms strong 2D baselines by 2–6% in-domain and 6–50% in OOD on Pascal3D+ and ObjectNet3D, and achieves the first incremental pose estimation results. The work demonstrates the practical value of 3D object-centric representations for robust class-incremental learning and paves the way for joint 3D perception under evolving class inventories.

Abstract

Different from human nature, it is still common practice today for vision tasks to train deep learning models only initially and on fixed datasets. A variety of approaches have recently addressed handling continual data streams. However, extending these methods to manage out-of-distribution (OOD) scenarios has not effectively been investigated. On the other hand, it has recently been shown that non-continual neural mesh models exhibit strong performance in generalizing to such OOD scenarios. To leverage this decisive property in a continual learning setting, we propose incremental neural mesh models that can be extended with new meshes over time. In addition, we present a latent space initialization strategy that enables us to allocate feature space for future unseen classes in advance and a positional regularization term that forces the features of the different classes to consistently stay in respective latent space regions. We demonstrate the effectiveness of our method through extensive experiments on the Pascal3D and ObjectNet3D datasets and show that our approach outperforms the baselines for classification by in the in-domain and by in the OOD setting. Our work also presents the first incremental learning approach for pose estimation. Our code and model can be found at https://github.com/Fischer-Tom/iNeMo.
Paper Structure (49 sections, 15 equations, 4 figures, 11 tables)

This paper contains 49 sections, 15 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: We present iNeMo that can perform class-incremental learning for pose estimation and classification, and performs well in out-of-distribution scenarios. Our method receives tasks $\mathcal{T}^i$ over time that consist of images with camera poses for new classes. We build up on Neural Mesh Models (NeMo) wang_nemo and abstract objects with simple cuboid 3D meshes, where each vertex carries a neural feature. The neural meshes are optimized together with a 2D feature extractor $\Phi_i$ and render-and-compare can then be used to perform pose estimation and classification. We introduce a memory that contains an old feature extractor $\Phi_{i-1}$ for distillation, a replay buffer $\mathcal{E}^{1:(i-1)}$ and a growing set of neural meshes $\mathfrak{N}$. Our results show that iNeMo outperforms all baselines for incremental learning and is significantly more robust than previous methods.
  • Figure 2: Overview of Regularization:a) The features are constrained to lie on a unit sphere and the latent space is initially uniformly populated. Centroids $e_i$ are then computed to lie maximally far apart, and the feature population is partitioned for a maximum number of classes. b) When starting a new task, the vertex features for each new cube from this task are randomly initialized from some class partition. By projecting the locations of the vertices to images, corresponding image features are determined as illustrated by the orange star. c) To avoid entanglement, we regularize the latent space by constraining the image feature to stay within the class partition using $\mathcal{L}_{etf}$. d) We then employ the contrastive loss $\mathcal{L}_{\text{cont}}$ that pulls the vertex and image features together and separates the image feature from other features of its own, and the other meshes.
  • Figure 3: Comparison of classification performance decay over tasks for our method and the baselines. Top-Left: Results for O3D (100 classes) split into 10 even tasks. Top-Right: Results for P3D (12 classes) split into 4 even tasks. Bottom: Results for O-P3D with occlusion levels L1, L2 and L3 after each task. One can observe that our method outperforms all other methods. Especially in the occluded cases, our method outperforms them by a very large margin up to $70\%$, even still showing strong performance for the largest occlusion level L3 with $60 - 80\%$ occlusions.
  • Figure 4: Comparison of the task-wise pose estimation accuracy on P3D for $4$ even tasks, where we show the thresholds left:$\pi/6$ and right:$\pi/18$. One can observe that our method outperforms all other methods and retains high pose estimation accuracy throughout the incremental training process. One can also observe that for pose estimation, there is a stronger dependence on the difficulty of the considered classes instead of the method's ability to retain knowledge.