Table of Contents
Fetching ...

View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection

Qi Zhang, Zhouhang Luo, Tao Yu, Hui Huang

TL;DR

This work tackles view transformation robustness in multi-view 3D object reconstruction by linking reconstruction errors to informative viewpoints. It introduces a reconstruction error-guided view selection module that identifies camera poses likely to cover reconstruction errors, and a Stable Diffusion based view synthesis module that generates the corresponding novel views for training. The approach is evaluated on a purpose-built ShapeNet-VTR dataset with aligned, hemispherical and spherical view ranges, and shows substantial gains over state-of-the-art methods for both CNN- and transformer-based reconstructions, with ablations confirming the effectiveness of the $K$-step viewpoint discretization and view pool. The results imply that robustness to view transformations can be achieved without extra inference costs by leveraging large vision models as a training data platform, and also generalize to real-world data such as Pix3D chairs.

Abstract

View transformation robustness (VTR) is critical for deep-learning-based multi-view 3D object reconstruction models, which indicates the methods' stability under inputs with various view transformations. However, existing research seldom focused on view transformation robustness in multi-view 3D object reconstruction. One direct way to improve the models' VTR is to produce data with more view transformations and add them to model training. Recent progress on large vision models, particularly Stable Diffusion models, has provided great potential for generating 3D models or synthesizing novel view images with only a single image input. Directly deploying these models at inference consumes heavy computation resources and their robustness to view transformations is not guaranteed either. To fully utilize the power of Stable Diffusion models without extra inference computation burdens, we propose to generate novel views with Stable Diffusion models for better view transformation robustness. Instead of synthesizing random views, we propose a reconstruction error-guided view selection method, which considers the reconstruction errors' spatial distribution of the 3D predictions and chooses the views that could cover the reconstruction errors as much as possible. The methods are trained and tested on sets with large view transformations to validate the 3D reconstruction models' robustness to view transformations. Extensive experiments demonstrate that the proposed method can outperform state-of-the-art 3D reconstruction methods and other view transformation robustness comparison methods. Code is available at: https://github.com/zqyq/VTR.

View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection

TL;DR

This work tackles view transformation robustness in multi-view 3D object reconstruction by linking reconstruction errors to informative viewpoints. It introduces a reconstruction error-guided view selection module that identifies camera poses likely to cover reconstruction errors, and a Stable Diffusion based view synthesis module that generates the corresponding novel views for training. The approach is evaluated on a purpose-built ShapeNet-VTR dataset with aligned, hemispherical and spherical view ranges, and shows substantial gains over state-of-the-art methods for both CNN- and transformer-based reconstructions, with ablations confirming the effectiveness of the -step viewpoint discretization and view pool. The results imply that robustness to view transformations can be achieved without extra inference costs by leveraging large vision models as a training data platform, and also generalize to real-world data such as Pix3D chairs.

Abstract

View transformation robustness (VTR) is critical for deep-learning-based multi-view 3D object reconstruction models, which indicates the methods' stability under inputs with various view transformations. However, existing research seldom focused on view transformation robustness in multi-view 3D object reconstruction. One direct way to improve the models' VTR is to produce data with more view transformations and add them to model training. Recent progress on large vision models, particularly Stable Diffusion models, has provided great potential for generating 3D models or synthesizing novel view images with only a single image input. Directly deploying these models at inference consumes heavy computation resources and their robustness to view transformations is not guaranteed either. To fully utilize the power of Stable Diffusion models without extra inference computation burdens, we propose to generate novel views with Stable Diffusion models for better view transformation robustness. Instead of synthesizing random views, we propose a reconstruction error-guided view selection method, which considers the reconstruction errors' spatial distribution of the 3D predictions and chooses the views that could cover the reconstruction errors as much as possible. The methods are trained and tested on sets with large view transformations to validate the 3D reconstruction models' robustness to view transformations. Extensive experiments demonstrate that the proposed method can outperform state-of-the-art 3D reconstruction methods and other view transformation robustness comparison methods. Code is available at: https://github.com/zqyq/VTR.

Paper Structure

This paper contains 16 sections, 3 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The method's main idea is to utilize the proposed reconstruction error-guided view selection method for selecting and generating viewpoints covering the most errors.
  • Figure 2: The pipeline of the method: first, the reconstruction error-guided view selection module chooses the views covering most reconstruction errors; then, the stable diffusion model works as a strong data platform for producing multi-view images of selected viewpoints; moreover, the new view images are added to finetune the 3D object reconstruction model. These steps are alternatively conducted and trained to increase the 3D reconstruction models' robustness to view transformations.
  • Figure 3: The view selection module: select views covering most errors and then sample based on selected views.
  • Figure 4: The 3D reconstruction results of Zero-1-to-3, which has weak robustness to view transformations.
  • Figure 5: The viewpoint distribution examples of our ShapeNet-VTR dataset. The blue dot is the object center.
  • ...and 2 more figures