View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection
Qi Zhang, Zhouhang Luo, Tao Yu, Hui Huang
TL;DR
This work tackles view transformation robustness in multi-view 3D object reconstruction by linking reconstruction errors to informative viewpoints. It introduces a reconstruction error-guided view selection module that identifies camera poses likely to cover reconstruction errors, and a Stable Diffusion based view synthesis module that generates the corresponding novel views for training. The approach is evaluated on a purpose-built ShapeNet-VTR dataset with aligned, hemispherical and spherical view ranges, and shows substantial gains over state-of-the-art methods for both CNN- and transformer-based reconstructions, with ablations confirming the effectiveness of the $K$-step viewpoint discretization and view pool. The results imply that robustness to view transformations can be achieved without extra inference costs by leveraging large vision models as a training data platform, and also generalize to real-world data such as Pix3D chairs.
Abstract
View transformation robustness (VTR) is critical for deep-learning-based multi-view 3D object reconstruction models, which indicates the methods' stability under inputs with various view transformations. However, existing research seldom focused on view transformation robustness in multi-view 3D object reconstruction. One direct way to improve the models' VTR is to produce data with more view transformations and add them to model training. Recent progress on large vision models, particularly Stable Diffusion models, has provided great potential for generating 3D models or synthesizing novel view images with only a single image input. Directly deploying these models at inference consumes heavy computation resources and their robustness to view transformations is not guaranteed either. To fully utilize the power of Stable Diffusion models without extra inference computation burdens, we propose to generate novel views with Stable Diffusion models for better view transformation robustness. Instead of synthesizing random views, we propose a reconstruction error-guided view selection method, which considers the reconstruction errors' spatial distribution of the 3D predictions and chooses the views that could cover the reconstruction errors as much as possible. The methods are trained and tested on sets with large view transformations to validate the 3D reconstruction models' robustness to view transformations. Extensive experiments demonstrate that the proposed method can outperform state-of-the-art 3D reconstruction methods and other view transformation robustness comparison methods. Code is available at: https://github.com/zqyq/VTR.
