Table of Contents
Fetching ...

CAD-NeRF: Learning NeRFs from Uncalibrated Few-view Images by CAD Model Retrieval

Xin Wen, Xuening Zhu, Renjiao Yi, Zhifeng Wang, Chenyang Zhu, Kai Xu

TL;DR

CAD-NeRF addresses reconstructing neural radiance fields from very few uncalibrated views by leveraging a ShapeNet-based CAD mini-library to bootstrap initial geometry and poses. It introduces a multi-view pose retrieval strategy that respects input order, a deformation-based density refinement, and joint optimization of density, pose, and texture in a self-supervised framework. Evaluations on synthetic and real data show CAD-NeRF achieving robust, high-quality novel-view synthesis and outperforming several state-of-the-art few-shot NeRF methods under extreme data scarcity. The approach broadens NeRF applicability to casual, uncalibrated photo collections by exploiting priors from CAD models.

Abstract

Reconstructing from multi-view images is a longstanding problem in 3D vision, where neural radiance fields (NeRFs) have shown great potential and get realistic rendered images of novel views. Currently, most NeRF methods either require accurate camera poses or a large number of input images, or even both. Reconstructing NeRF from few-view images without poses is challenging and highly ill-posed. To address this problem, we propose CAD-NeRF, a method reconstructed from less than 10 images without any known poses. Specifically, we build a mini library of several CAD models from ShapeNet and render them from many random views. Given sparse-view input images, we run a model and pose retrieval from the library, to get a model with similar shapes, serving as the density supervision and pose initializations. Here we propose a multi-view pose retrieval method to avoid pose conflicts among views, which is a new and unseen problem in uncalibrated NeRF methods. Then, the geometry of the object is trained by the CAD guidance. The deformation of the density field and camera poses are optimized jointly. Then texture and density are trained and fine-tuned as well. All training phases are in self-supervised manners. Comprehensive evaluations of synthetic and real images show that CAD-NeRF successfully learns accurate densities with a large deformation from retrieved CAD models, showing the generalization abilities.

CAD-NeRF: Learning NeRFs from Uncalibrated Few-view Images by CAD Model Retrieval

TL;DR

CAD-NeRF addresses reconstructing neural radiance fields from very few uncalibrated views by leveraging a ShapeNet-based CAD mini-library to bootstrap initial geometry and poses. It introduces a multi-view pose retrieval strategy that respects input order, a deformation-based density refinement, and joint optimization of density, pose, and texture in a self-supervised framework. Evaluations on synthetic and real data show CAD-NeRF achieving robust, high-quality novel-view synthesis and outperforming several state-of-the-art few-shot NeRF methods under extreme data scarcity. The approach broadens NeRF applicability to casual, uncalibrated photo collections by exploiting priors from CAD models.

Abstract

Reconstructing from multi-view images is a longstanding problem in 3D vision, where neural radiance fields (NeRFs) have shown great potential and get realistic rendered images of novel views. Currently, most NeRF methods either require accurate camera poses or a large number of input images, or even both. Reconstructing NeRF from few-view images without poses is challenging and highly ill-posed. To address this problem, we propose CAD-NeRF, a method reconstructed from less than 10 images without any known poses. Specifically, we build a mini library of several CAD models from ShapeNet and render them from many random views. Given sparse-view input images, we run a model and pose retrieval from the library, to get a model with similar shapes, serving as the density supervision and pose initializations. Here we propose a multi-view pose retrieval method to avoid pose conflicts among views, which is a new and unseen problem in uncalibrated NeRF methods. Then, the geometry of the object is trained by the CAD guidance. The deformation of the density field and camera poses are optimized jointly. Then texture and density are trained and fine-tuned as well. All training phases are in self-supervised manners. Comprehensive evaluations of synthetic and real images show that CAD-NeRF successfully learns accurate densities with a large deformation from retrieved CAD models, showing the generalization abilities.

Paper Structure

This paper contains 14 sections, 9 equations, 23 figures, 6 tables, 1 algorithm.

Figures (23)

  • Figure 1: CAD-NeRF takes few-view images with unknown poses as inputs, and jointly optimizes density, texture, and poses with the help of priors from the CAD mini library.
  • Figure 2: The CAD-NeRF pipeline. Input images are used to retrieve the model of poses from the mini library. The CAD model is treated as the supervision to pre-train the initial density field (phase one). In phase two, sampled rays are sent from retrieved poses to generate 3D points along the rays, for the deformation network to predict the offset and correction of each point. Poses are optimized at the same time. In phase three, a color network is added and three networks are trained together.
  • Figure 3: CAD models in the library.
  • Figure 4: Comparisons of single-view retrieval and the proposed multi-view retrieval.
  • Figure 5: Training pipeline of images with backgrounds.
  • ...and 18 more figures