Table of Contents
Fetching ...

A Fusion of Variational Distribution Priors and Saliency Map Replay for Continual 3D Reconstruction

Sanchar Palit, Sandika Biswas

TL;DR

This work proposes a continual learning-based 3D reconstruction method where the goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes.

Abstract

Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images. This task requires significant data acquisition to predict both visible and occluded portions of the shape. Furthermore, learning-based methods face the difficulty of creating a comprehensive training dataset for all possible classes. To this end, we propose a continual learning-based 3D reconstruction method where our goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes. Variational Priors represent abstract shapes and combat forgetting, whereas saliency maps preserve object attributes with less memory usage. This is vital due to resource constraints in storing extensive training data. Additionally, we introduce saliency map-based experience replay to capture global and distinct object features. Thorough experiments show competitive results compared to established methods, both quantitatively and qualitatively.

A Fusion of Variational Distribution Priors and Saliency Map Replay for Continual 3D Reconstruction

TL;DR

This work proposes a continual learning-based 3D reconstruction method where the goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes.

Abstract

Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images. This task requires significant data acquisition to predict both visible and occluded portions of the shape. Furthermore, learning-based methods face the difficulty of creating a comprehensive training dataset for all possible classes. To this end, we propose a continual learning-based 3D reconstruction method where our goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes. Variational Priors represent abstract shapes and combat forgetting, whereas saliency maps preserve object attributes with less memory usage. This is vital due to resource constraints in storing extensive training data. Additionally, we introduce saliency map-based experience replay to capture global and distinct object features. Thorough experiments show competitive results compared to established methods, both quantitatively and qualitatively.
Paper Structure (20 sections, 11 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 11 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: This figure shows the catastrophic forgetting in learning-based 3D reconstruction models, such as Occupancy networkmescheder2019occupancy. Initially, the model is trained on classes such as airplane, chair, and speaker, successfully reconstructing these shapes. However, after training the model on a new object, such as a car, while the model is able to reconstruct the car, the reconstruction of the previously learned shapes becomes inaccurate, as demonstrated (with and without EWCkirkpatrick2017overcoming, a continual learning based method). In contrast, Our method is able to faithfully reproduce the shape even after training the model on new objects thus mitigating catastrophic forgetting.
  • Figure 2: During an incremental session, the training loss is updated as shown in the figure. The model is simultaneously trained using a combination of generated pseudo-images and the current dataset of the session. The variational latent encoder has acquired latent distributions corresponding to distinct shapes for each session. Denoting the latent distribution for the $j$-th class of the $i$-th session as $\mathcal{Q}_{i,j}(\mathcal{Z})$, and $\mathcal{Q}_{1,1}(\mathcal{Z}), \dots, \mathcal{Q}_{\mathcal{M}, m}(\mathcal{Z})$ as the latent distributions of shapes encountered in prior instances up to the $t-1$-th session, the mesh is generated from the latent code using Multiresolution IsoSurface Extraction.
  • Figure 3: Mean values of the latent variables for all objects from session 0 to 4 in the ShapeNet-13. Enlarge for a clearer view.
  • Figure 4: Reconstruction of different objects across 5 sessions of ShapeNet 13 objects.
  • Figure 5: The saliency maps for various objects acquired during session 0.
  • ...and 6 more figures